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Experimental psychology now conceives reading and spelling to 
be not single, unitary processes but a number of activities highly 
integrated. Like the smooth flow of power from an automobile, how- 
ever, fluent reading or spelling is the result of a complex organization 
of delicate mechanisms that must work together in perfect coor- 
dination. Defects or deficiencies of any of the mechanisms may disturb 


Bor inhibit normal activity. The total function may be best under- 


stood and consequently, troubles may be prevented or remedied and 


# high efficiency secured by an understanding of the constituent mechan- 


isms and their relations. 
There are several ways in which the investigator may seek to dis- 


Hentangle and appraise the mechanisms and functions upon which 


good reading or spelling depends. (1) One may study the especially 


# por reader to ascertain what, if any, of the organs or functions are 


defective or deficient; (2) one may study cases known to be especially 


i deficient in some organ or process—the deaf, astigmatic, or very dull, 








# for example—to discover the corresponding deficiency, if any, in the 


scholastic ability; (3) one may study similarly those of especially good 
scholastic ability; or (4) those especially gifted in memory or intelli- 
gence, etc.; and (5) one may study a group of representative persons, 
that is a group including proper portions of good, bad and indifferent 
abilities, attempting to discover the correlations between scholastic 
proficiency on the one hand and special organic and functional excel- 
lence on the other. In this paper some data gathered by means of the 
last method are to be presented. 

The study comprises the construction of a number of tests designed, 
tither singly or in combination, to measure certain mental functions 
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or capacities such as perception and association. The series of testg 
was given to representative learners—about 310 school children, 
Grades I to VI, in all—and the results analyzed by the method of 
correlation. 


THE Tests UTILIZED 


To make the discussion of results intelligible, a brief description of 
the tests and the functions assumed to be tested by these instruments 
must be given. 


I. 


II. 


Tests of visual perception. 

(A) These tests, alike in general form, require the subject to 
perceive small differences between pairs of visual 
items, encircling each pair when the members are 
not alike. Score is number of correct reactions minus 
three times errors and omissions. 

Test 1. <A page of 60 pairs of geometrical figures. 

Test 2. <A page of 90 pairs of series of digits. 

Test 3. A page of 105 pairs of words. 

(B) Two tests, requiring the selection from a block of five 
or more items the one which is identical with a sample 
given at the left of the block. Score is number 
correct. 

Test 4. A page of 42 blocks of geometrical designs. 

Test 5. A page of 36 blocks of words varying from 3 to ll 
letters. 

(C) Test of speed of visual perception of drawings of common 
objects. 

Test 6. A page of small, clear line drawings of 77 common 
objects—boy, hat, cup, etc.—to be named as rapidly 
as possible. Score is time required. 

Capacity in associative learning; auditory—visual association. 

Test 7. Twelve cards on each of which is a geometrical 
figure to be presented simultaneously with a spoken 
word. Each figure is displayed for three seconds while 
the word is spoken twice. Cards are shuffled and pre- 
sented with the spoken word again. Final test con- 
sists in presenting the visual figure while the subject 
writes the associated word. This test is similar to 
the learning of printed words—which are at first 
merely complex visual figures—when the words are 
pronounced by a parent or other teacher. 
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III. Capacity in associative learning; visual—visual association. 

Test 8. Twelve cards, each containing a nonsense word 

and a geometrical drawing. Two exposures as in 

Test 7. Test by presenting figures and calling for 

reproduction of the nonsense words. This test is 

similar to learning new words in connection with 

pictures, familar words, or other visible signs. Both 

of these tests were designed to measure ease of forming 

associative connections on the assumption that similar 
associations are involved in learning to read. 

IV. Capacity for general linguistic and abstract learning. 

Test 9. Mental age as determined by the Stanford-Binet 
Intelligence Scale. 

V. Tests of reading abilities. 

Test 10. Speed of recognition and pronunciation of easy 
words. Speed of accurate reading of a series of 200 
familiar two- and three-letter words. 

Test 11. Level of recognition and pronunciation of words. 
A series of words, graded in difficulty from easy to 
hard. The score—total number pronounced correctly 
—indicates the highest level of difficulty mastered. 

Test 12. Rate of accurate comprehension of easy reading 
material. Courtis Silent Reading Test, No. 2; a 
series of equally difficult paragraphs and questions. 

VI. Test of spelling ability. 

Test 13. <A series of 36 words ranging in difficulty from me 
to miscellaneous and conscientious. 

Twelve of these tests (Test 6 was given to about 75 pupils in 
Grades II, III and IV only) were given to the following 
school children either individually or in small groups 
where the nature of the test permitted: 

Group 1. Grades I and II, 75 pupils. 

Group 2. Grades III and IV, 90 pupils. 

Group 3. Grades, V, VI and VII, 144 pupils. 

Three sets of intercorrelations were computed, one for each of the 
three groups above. Since in each group there is some, and perhaps 
an unequal variation in age, the obtained correlations will tend to 
be higher than those which would have been secured from groups 


1 See a description of this type of test in the Teachers College Record, November, 
1924. 
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with age constant, and the increases in correlations may be unequal 
in the several groups, but for most of our purposes these variations 
will not matter appreciably inasmuch as we shall be interested ip 
comparing one coeflicient with others obtained from the same group. 


Where other comparisons are desired, appropriate corrections wil] 
be made. 


PERCEPTION IN READING 


We shall first consider the nature of visual perception and the 
role it appears to play in reading. 

Tests 1 to 5 inclusive are obviously measures of speed and accuracy 
in the perception of different kinds of items. There were two forms of 
reaction required; the comparison of two members of a pair of items, 
and the selection of a sample item from a larger number, all except 
one different in some detail. There were three kinds of items used, 
namely, geometrical figures, digits and words. With correlations 
between these tests available, it becomes possible to analyze, in some 
measure, the nature of perception; how uniformly it operates on differ- 
ent kinds of material and how constant it is when, with materials either 
the same or different, a change in the precise form of the perceptive 
reaction is demanded. Such knowledge concerning the nature of 
perception is necessary before we attempt to speak of the rdle of per- 
ception—in general—in reading. 

First, the constancy of perceptive ability in quite different situa- 
tions when both the content and form of perception are not exactly 
alike, is indicated by the data of Table I. The average of these cor- 
relations is low; combining results for all tests and all groups, the coeffi- 
cient is 0.35. Were the factor of age eliminated—as was done in a 
few representative comparisons—the average correlation would be 
somewhere around 0.30 which is a very low positive association. This 
result may be interpreted as follows: What we call visual perception 
is not a single, unitary capacity or power which operates uniformly 
upon all sorts of data and under all conditions; perception, on the 
contrary, is specialized. Each person perceives some things better 
than others. In connection with the present problem, it must be said 
that a person who perceives poorly (or well) non-verbal items will not 
necessarily perceive words poorly (or well); nor will the person who 
perceives poorly (or well) in reading surely perceive similarly other 
data. Perception, as it functions with words as data, then, is rather 4 
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TaBLE I.—CORRELATIONS BETWEEN PERCEPTION TESTS Not IDENTICAL iN EITHER 
Form or ContTENT 

















Tests correlated | Group 1} Group 2| Group 3 
bie 
(1) Figures and (5) words....................044. | 0.37 0.31 0.37 
(2) Digits and (4) figures.......................4. _ 0.31 0.37 0.31 
OE eeerrer rrr er reer ery. | 0.42 0.55 0.55 
(3) Words and (4) figures....................00- | 0.27 0.07 0.12 
a bbdiads ccinandheaeavanasses coeaenen | 0.34 0.38 0.34 
NE Sc), os fbi h baad ee See RA eee | 0.35 











—_ 


special kind of perception and in the majority of cases it cannot be pre- 
dicted at all accurately from knowledge of other types of perception. 

A relevant problem is whether the kind of perceptive response 
varies more with changes in the material perceived or with the way 
in which the items are apprehended. Among the correlations, it is 
possible to secure four instances in which the form of the tests was 
the same while the contents differed and two cases in which the 
contents were the same (strictly speaking, similar) while the form of 
the reactions differed. The results of these comparisons are given 
in Table II. Although the number of comparisons is too small to 














TaBLe II 
(A) Correlations between Tests Alike in Form but Different in Content 
Tests, number and description Group 1} Group 2} Group 3 
(1) Pair of figures and (2) pairs of digits........... 0.38 | 0.43 | 0.45 
(1) Pair of figures and (3) pairs of words..........., 0.44 0.30 0.41 
(2) Pairs of digits and (3) pairs of words...........| 0.54 0.47 0.57 
(4) Selection of figures and (5) selection of words..... 0.34 0.14 0.26 
EE TET | 0.44 0.34 0.42 
Average of column averages.................... ee ‘aaa 0.40 











(B) Correlations between Tests Alike in Content but Different in Form 








(1)[Pair of figures and (4) selection of figures....... | 0.40 0.20 0.47 
(3) Pair of words and (5) selection of words........ | 0.65 0.83 0.60 
EET PETT EEE ECT PERT Ee | 0.53 0.52 0.54 
Average of column averages................-05. | rr rae} 0.53 
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give results more than suggestive, the indication is that efficiency jp 
perception is determined more by the material perceived than by the 
form of the perceptive response. The suggestion is, in other words 
that differences in reading ability may depend considerably upon 
the effectiveness of the specific skills pupils have of perceiving a certain 
kind of material, namely, printed words and that in diagnosis much 
attention should be given to the methods of perceiving words ip 
typical test situations. 

This indication is an important one since it embraces several 
implications concerning the sources of difficulty in reading and learn. 
ing to read. It may be tested in another way from the correlations 
at hand by a larger number of comparisons. It may be tested by 
comparing the correlations between all of our tests in which words are 
used in both tests correlated—no matter what the test requires the 
subject to do with words—-with all of the tests in which the same kind 
of materials are not used in the tests compared. This procedure 
should be carried out so as to yield in the aggregate the same forms 
of reactions in both cases but in one, only words as content and in the 
other various materials. These results are shown in Tables III and 
IV. 

While it is possible that some of the differences between these two 
arrays of correlations are due to differences in the reliability of instru- 
ments used, there can be no doubt that in the main the differences are 
due to the character of the materials to which the subjects reacted in 
the test. When the materials for both tests correlated are words the 
average of the coefficients is 0.685; when one test is made of words and 
the other of other materials, the average is 0.245. These results will 
be understood when we change our terms to correspond to the psy- 
chological facts. ‘‘Words’”’ are the material of the test—in the first 
case—but, psychologically they are only items to which the pupils 
react in taking the tests. The psychological data, in other words, are 
the reactions and not the physical printed words. The facts then 
should be expressed as follows: In these various tests, of perceptive 
activities, there is much in common among the reactions made when 
the items are words in various settings but there is little in common 
among the perceptive responses to different kinds of items even when 
the general test situations are similar. 

While these comparisons give us important facts concerning the 
nature of perception in general and also of the perception of words, 
they do not alone prove that the types of word-perception here meas 
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TapLe I1].—CorrELATIONS OF TESTS IN WHICH THE MATERIALS ARE THE SAME 
i.e., WorRDSs 





—_ 








Tests | Group 2 Group 3 Tests Group 2 | Group 3 
MOORG........:5.5.] "68 60 || 5and13......... 65 42 
gand 10........ 64 .44 | 10 and 11......... 81 .57 
Reed Gl............| .% 48 || 10 and 12......... 53 41 
Zand 12........ 74 .69 || 10 and 13......... 61 52 
Meee aB............] 08 .55 || lLand 12......... .50 42 
SE DO........-..-| 4 .48 || 1land 13......... .78 .79 
5and 11... | 83 41 f | aes .37 25 
|. ee L- a 56 | 

|, BL i ontins-s .66 51 
| I Grand average... .685 














TaBLE 1V.—CorRRELATIONS OF TESTS IN WHICH THE MATERIALS ARE NOT THE 











SAME 

Tests Group 2! Group 3 | Tests Group 2} Group 3 
 iihen uta a 43 41 ||4and3........... .07 .12 
ER ee 31 ME BOC Gic ce ccvcacce 14 .26 
land 10 .03 wee is dceccese .02 .03 
land 11 .25 ee is htesnees .03 .08 
land 12 13 , eS og: ee .06 .14 
land 13 .32 . 2S) 0lUGll eee .10 .07 
Se 47 .57 , | ae .37 .10 
2and 5 55 . 2, ° eee .33 .22 
Aas abet n ae 4 46 ie th C6 0440040 .37 .07 
Se .40 . Ot, ll ry .40 .06 
Zand 12...... ee .34 .37 ¢ » Aes .12 11 
TEs steveseet .38 ee VU iwesdcccccs .30 13 
Average......... .27 .22 

Grand average... .245 























ured are important factors in actual reading or that the other types 


of perceptive response do not enter into normal reading. 


V appear the correlations of all the tests with reading. 

Only the data for Groups 2 and 3 in which reading was more 
advanced, are considered. ‘These correlations suggest that the per- 
ception tests utilizing digits and various printed figures activate reac- 
tions that exert little influence on reading and spelling whereas those 
perception tests which utilize words depend on reactions that are 


In Table 
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TABLE V 
| Test 10, | 7. 4, | Test 12, | | 
| speed of cial of rate of | Average 
Tests | Group | Propun- | pronun- | COMPre- of | Test 13, 
| ciation ciation hension | reading | spelling 
| of short | Pe pce of para- | _ tests 
| words | graphs 
(3) Pairs of words. 2 64 | .70 | «74 | 65 
= 44 | 48 69 see 1 
Average........ iar 54 | 569 | .72 | 62 | 60 
(5) Selection of} 2 | 49 | .84 | (72 |... 6 
words. | 8 | .48 | .41 | ~ .56 a 
Average........) .. {| .49 | 63 | .64 | - .59 ee: 
(6) Pairsofdigits.., 2 | .03 | .25 mi... t# 
ee aa ae ee 
Average........ | a |}lUtC . ao a 
a | 
(2) Pairsofdigits... 2 | .46 | .40 46 oe 
3 | 383 | .26 . ee 
Average........ ~~ + sae | ae 47 40 | 37 
| 
(4) Selection fig- 2 | 02 | .03 | .06 | | .10 
ures. 3 | .03 | .08 Mt ke | Se 
Average........ .. | 00 | .02 10 | .03 | .08 








very probably important factors in both reading and spelling. It 
is possible, however, that a certain minimum of perceptive ability 
of all these types, that is, visual perception in general, is necessary 
but that above this amount, additional perceptual aptitude exerts 
very little influence upon reading achievement. Study of the scatter 
diagrams, however, shows little evidence of a critical point; the regres- 
sion lines were quite rectilinear. Among those in the lowest quintile 
in the average of the non-verbal perception tests were good as wellas 
poor readers though the latter were somewhat more numerous as 
would be expected from rectilinear regressions of data showing any 
degree of positive correlation. This may mean—eliminating defects 
or deficiencies in the sensory apparatus—that very low levels of per- 
ceptive abilities are sufficient for good reading. This point will be 
pursued further, however, in the results of examinations of groups of 
poor readers to be presented in another study. 
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Only the tests utilizing words as materials give substantial correla- 
tions with reading and spelling. If these tests measure what they 
were designed to gauge, namely, sheer perception of words, then it 
may be inferred that such perceptive skill is an important factor in 
reading and spelling. There is a possibility, however, that the 
obtained correlations are due to the mediating effects of some other 
abilities, such as general intelligence, which influences both the word 
perception and the reading and spelling abilities. This possibility 
should be tested. 

Table VI shows the correlations between the Stanford-Binet Mental 
Age and several other tests. Mental age is correlated most highly with 
reading, spelling and the other verbal tests. The correlations with 


TaBLE VI.—CoRRELATIONS OF MENTAL AGE AND OTHER TeEsTs INDICATED 





| Group 2 | Group 3 | Average 











MA and 12, silent reading.....................) .52 48 | .50 
TL 35 46 41 
MA and 11, level of pronunciation of words..... 36 | 24 | .30 
MA and 10, speed of pronunciation of words... . 35 .23 .29 
MA and 3, pairs of words..................... | 31 .24 .28 
MA and 5, selection of words.................. .22 .20 21 
NE he nnd we wem nance sans .23 .09 mR 
MA and 2, pairs of digits..................... 17 13 15 
MA and 3, selection of figures................. 18 | 05 | .12 





the non-verbal tests are low. It is, therefore, probable that some of 
the association between reading and spelling and the word perception 
tests is due to the mediating influence of intelligence since it has been 
shown that all of these functions are intercorrelated. It is, conse- 
quently, advisable to ascertain the correlations between the perception 
tests and reading and spelling after the influence of intelligence is 
removed. 

The technique of partial correlation was applied to a sufficient 
number of the combinations to yield representative results, as shown 
in the following: 


r of silent reading (12) and word perception (5), intelligence 


SN i ce ind aii ehh chee oa ei el del ede 0.69 
r of level of pronunciation of words (10) and word perception 
(3), intelligence eliminated...........:.........0000e0e: 0.555 


r of spelling (13) and word perception (3), intelligence elimi- 
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When the influence of intelligence is eliminated, the correlations 
of reading and spelling with a single sample of word perception remains 
relatively high. These partial correlations are higher than the raw 
correlations of intelligence and reading, pronunciation or spelling 
which were .50, .30 and .41, respectively. These figures imply that 
the perceptive factor, irrespective of intelligence, is more closely 
associated with reading and spelling than all of the functions embraced 
in ‘“‘intelligence’”’ as measured. It should be noted at this point that 
the intelligence test used—the Stanford-Binet—includes practically 
no reading; it was selected especially tovavoid the confusion which an 
intelligence test including reading would necessarily entail. 

In addition to eliminating intelligence, the factor of age should be 
‘“‘partialled out’’ but, since the correlation of both intelligence and 
word-perception with age among our groups is very low, this operation 
would bring about a scarcely perceptible decrease in the coefficients 
last given. 

The facts here presented agree substantially with those secured in 
an earlier study.' Combining the results of both investigations, the 
returns obtained from approximately 450 subjects, in various grades, 
taught in three different schools, tested by different examiners with 
two batteries of tests largely different, point to similar conclusions, 
namely: (1) Visual perception varies widely according to the kind of 
items perceived and less widely according to the form of the perceptive 
response when the kind of items remain constant; and (2) ability to 
perceive word-forms (with the influence of age and intelligence elimi- 
nated) is substantially associated with reading and spelling ability. 

Precisely what the nature of such “‘word-perception”’ ability 
is—whether due primarily to specific native aptitude or to acquired 
skill; whether largely a function of general reading experience and skill, 
or the result of special practice such as may be provided by flash-card 
exercises, phonetic drill, word analysis, ete.—must be determined by 
investigations of other types. The facts here shown indicate at least 
that the discovery of the nature and causes of differences in word- 
perception would be of great practical importance. 


THE INFLUENCE OF GENERAL INTELLIGENCE 


The Stanford-Binet Mental Age, it was found, is correlated with 
reading ability, as measured, to the extent of approximately 0.50 (see 





1A. I. Gates: “The Psychology of Reading and Spelling with Special Refer- 
ence to Disability.” New York: Teachers College Bureau of Publications, 1923. 
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Table VI); with spelling the r is 0.41. Eliminating from these asso- 
ciations the influence of word-perception, the partial r’s are: Intelli- 
gence with silent reading, 0.47; with spelling, 0.31. Eliminating age 
would further decrease the coefficients by amounts barely sensible. 
These associations are too familiar to need comment except, perhaps, 
to note that they are lower than the correlations of word-perception 
and reading or word-perception and spelling. 


THe INFLUENCE OF CAPACITIES FOR ASSOCIATIVE LEARNING 


Learning to read and spell involve, in addition to perceptive 
analysis, the formation of associative connections between one visual 


Taste VII 
Correlations of Test 7, Association of Auditory and Visual Symbols With 

















Test Group 2 | Group 3 | Average 
ENE. 6 és eccceescrteevitercuseae .12 11 .12 
11, Level of pronunciation...............e.ee0. .40 . 06 .23 
10, Speed of pronunciation.................... .03 .07 .02 
ee ee ern .30 13 .22 
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po reeeenem OF Gemees. 6 oo. ccc ccc sececceves .02 .18 .08 
i es sh ace eecbeeneanes ene .23 11 17 
4, Perception of figures...................00.- .18 .05 .12 
8, Associative, visual-visual................... .56 54 55 
9, Intelligence........ FE Ee Pine GEE rene ay .29 .35 . 32 
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Test Group 2/| Group 3/| Average 
aia sy os a ger bees hemena ee .08 .06 .07 
11, Level of pronunciation.................-.. 31 .20 .26 
10, Speed of pronunciation.................... .03 .02 .03 
es che nannbedkdnKkGase< seb ae .40 .08 .24 
m, wemoemtiom Of Words. ........ 05sec cccccscven .32 .00 .16 
SB) Pepeeption of words..........ccccccccccccss .33 .22 .28 
ay Peweeption of figures. ........cscccccccccees .03 17 .10 
m wemenption Of Gigtts...... 22. cccccwccsccees .28 .09 .19 
4, Perception of figures...................0... .06 .09 .08 
8, Associative, auditory-visual................ .56 54 55 
ER re eer eae ware Ree ee .30 .37 34 
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item and another as when an unfamiliar work is acquired in context, 
or in association with a synonym, picture, or diagram and between a 
visual and an auditory item as when a printed word is learned by pre. 
senting it simultaneously with a spoken word or when the spellj 
is recalled in response to the spoken word. Inabilities of children to 
learn to read or spell have frequently been diagnosed as due not 
to deficiencies in general intelligence or general learning ability but to 
a specific deficiency in the associative mechanisms—deficiencies ip 
“auditory memory,” or ‘‘visual memory,” in “‘inability to associate 
auditory and visual symbols,” congenital or acquired “‘ word-blind- 
ness,”’ or defective ‘‘brain areas,” ‘“‘association tracts,” etc. Tests 7 
and 8 were introduced to yield a measure of such capacities. In 
Table VII the correlations of these tests with others are given. 

Before discussing these correlations, it should be mentioned that the 
association tests were designed to depend very little upon speed or 
power to perceive printed or oral words since but 12 very familiar 
words were used and abundant time for perception given. It will 
be seen, in the table, that the correlations of the association with the 
perception tests are low. 

The highest correlation is between the two associative-learning 
tests themselves, 0.55; the next is with intelligence, 0.32 and 0.34; the 
correlations with reading, pronunciation, spelling and word-perception 
are low. 

Partial correlations have not been undertaken to demonstrate 
more fully the inter-relations of these functions since the facts may be 
observed by inspection of the coefficients in Table VII. It is apparent 
that these particular tests of associative learning, except insofar as 
they measure the same abilities as the test of general intelligence, are 
not closely associated with levels of ability in reading and spelling 
among the pupils studied. Whether there exist—as some specialists 
have long maintained—occasional cases of a specialized defect or 
deficiency, ‘‘congenital” or acquired, which inhibits progress in 
connection with reading and spelling; whether a certain minimum of 
associative ability is a sine qua non but higher amounts of little more 
service; whether the tests here used are or are not measures of the 
exact type of associative capacity utilized in learning to read are all 
important questions worthy of study. 


SUMMARY 


Of the several abilities studied, that termed ‘“word-perception”’ is 
most closely associated with achievement in reading and spelling; 
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intelligence yields the next highest correlation whereas tests of per- 
ception of geometrical figures of different sorts and digits, of associa- 
tive learning of visual and auditory symbols, or of visual and visual 
symbols show but slight association with these school abilities. While 
all these capacities or abilities merit further study, the results indicate 
that further investigations of the nature of word-perception and its 
relation to reading and spelling achievement are likely to be most 
fruitful. 

We need especially to discover the varieties of word-perception; 
the characteristic of effective and ineffective types; whether these 
various forms are primarily due to native aptitude or to acquired 
attitudes or skills; whether they may be readily changed and improved; 
how they are or may be influenced by various types of training—such 
as flash-card drill, efforts to read rapidly, phonetic practices, syllabi- 
cation, articulation or other less familiar but possible forms of train- 
ing; whether the forms of word-perception most effective in reading 
are also most productive in spelling and so on. Some of these prob- 
lems are now the subjects of research. 








THE SCORING OF INDIVIDUAL PERFORMANCE 
L. L. THURSTONE 
University of Chicago 


The object of the present study is to develop a method of determin. 
ing the score of an individual subject under conditions that frequently 
arise in the construction of a mental test. The method should prove 
applicable to mental age scales and also to educational test scales. 
It allows considerably more freedom in the form of test construction 
and more freedom in the procedure for giving tests than the individual 
test methods in current use. 

The specific requirements that this study proposes to satisfy are 
briefly as follows: 

1. It should not be required to have the same number of test ele- 
ments at each step on the scale. 

2. It should be possible to omit several test questions at different 
levels of the scale without affecting the individual score. 

3. It should be possible to include in the same scale two forms of 
test, namely (a) those that are scored either right or wrong by the all- 
or-none principle, and (b) those that give a variable score such as time 
in seconds, or number of right answers or number of errors. 

4. It should not be required to submit every subject to the whole 
range of the scale. The starting point and the terminal point, being 
selected by the examiner, should not directly affect the individual 
score. 

5. It should be possible to use the scale so that a rational score may 
be determined for each individual subject and so that the performance 
of groups of subjects may be compared. 

6. The arithmetical labor in determining individual scores should 
be a minimum. 

7. The procedure should be as far as possible consistent with 
psychophysical methods so that it will free from the logical errors 
involved in the Binet scale and its variants. 

In every form of mental measurement, whether it be the determina- 
tion of a psychophysical limen, a mental age, or the relative standing 
of an individual with reference to a group, we have always a more or 
less definitely graded series of tasks which ranges from easy to difficult 
levels. The level of each task is determined by the proportion of 
correct answers for one subject, or for a group of subjects. The 
simplest and the most straight-forward procedure is to give the subject 
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a series of tasks the answers to which can each be scored right or 
wrong. The score for each subject will then be the total number of 
tasks correctly done asscored by theall-or-none principle. If all mental 
measurement could proceed on so simple a basis there would be no 
problem of scoring. A score would be simply the total number of 
correct answers which could readily be interpreted by a scatter dia- 
gram with age or other test criterion. 

One complication arises in deciding upon the level of the starting 
point for a particular subject. If the scale includes tasks for 3-year- 
olds up to 18-year-olds, one can hardly expect the older subjects to 
take good-naturedly the infantile questions. One must skip over 
them and start somewhere just below the point where any particular 
subject might conceivably begin to make some mistakes. In present- 
ing the scale to a subject, one selects a starting point at as low a level 
as may be necessary to catch and record the lowest errors on the scale 
that the subject is likely to make. But then the score for each subject 
will not be merely his total number of right answers. It must be 
corrected by the addition of the total number of tasks below the 
starting point that was selected for him. This can, of course, be taken 
care of readily enough by simply adding to his total number of right 
answers the number of tasks below his starting point. That is the 
procedure actually followed in the Binet test where credit is given for 
all the questions below the age group at which one begins with the 
particular child. This introduces one of the requirements that we 
shall make of a scoring method, namely, that it shall be relatively 
independent of the starting point on the scale. 

It happens not infrequently that there is some good reason why 
certain questions should be omitted for a particular subject. For 
example, the examiner may discover that the subject has been coached 
on certain tasks which, of course, would rule them out as test measures. 
There may be lack of time for completing the whole scale. If tests 
are given to very young children, they may simply refuse point-blank 
to be interested in some of the tasks. Such responses can hardly be 
called failures in the adult sense and they must therefore be omitted 
for certain children. The omission of certain test questions intro- 
duces asecond complication in thescoring method. One cannot simply 
count the number of right answers as the individual score when some 
of the questions have been omitted. 

In order to take care of such emergencies one might count the 
proportion of right answers as the score instead of counting the absolute 
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number of right answers. Such a scoring method would be fair to 
all the subjects in that if one subject answers more questions than 
another, he would be expected to answer the same proportion of ques. 
tions correctly in order to get the same score. But such a procedure 
breaks down when we look closer at the possibilities. Suppose that 
we omit 10 difficult questions for one subject and 10 easy questions 
for another subject. Evidently the former would have a distinct 
advantage and the two individual scores would not represent the 
respective capacities even if they were determined by the proportions 
of right answers. This consideration makes it clear that whenever 
some questions are to be omitted, we must take account of the level 
of difficulty of the omitted questions. We see, therefore, that to 
score each subject by counting merely the total number of right 
answers, corrected for the starting point on the scale, or modified 
so as to express the proportion of right answers, is not flexible enough 
to take care of the situations that arise in individual mental 
examinations. 

In order to take care of these contingencies in which a few difficult 
questions may be omitted for one subject and a few easy questions for 
another subject, it is necessary to determine the score as a summation 
of performances at successive levels on the scale. We are reduced now 
to the possibility of determining the proportion of right answers at 
each level. For each class mterval on the scale we might determine 
the proportion of right answers. This procedure allows for the omis- 
sion of questions and it takes into consideration that the omitted 
questions may be widely different in difficulty. But what shall we 
do with these ratios when we have them? The score of each subject 
would then be a series of ratios, one for each level on the scale, indi- 
cating the proportion of right answers for each grade of difficulty. 
But what we require in a score is the allocation of the individual sub- 
ject to a particular point on the scale which shall represent his capac- 
ity. This problem is analogous to the psychophysical problem of 
determining a difference limen. 

If we were to plot a chart for each subject showing the proportion 
of right answers at each level of difficulty we should have a curve like 
Fig. 1. At low levels on the scale the performance is perfect and the 
proportion of right answers is there 100 per cent. As the subject 
encounters more difficult levels on the scale the proportion of right 
answers decreases and it finally reaches 0 per cent at the higher levels 
of difficulty. Our problem is now to assign some point on the scale 
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which shall represent the subject, very much as an average represents 
a set of numbers. The logical point is the scale value at which the 
subject gets half of the problems right and half of them wrong. That 
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scale value, or score, is at the point S in Fig. 1. The graph shows the 
theoretical curve, R, for the proportion of right answers and the cor- 
responding theoretical curve, W, for the proportion of wrong answers. 
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Fig. 2. 


It will be noticed that the sum of the proportions of right and wrong 
answers is equal to unity. Where the proportion of right answers is 
100, the corresponding proportion of wrong answers is of course 0. 
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When both of the proportions are 50 per cent, the two curves cross at 
the scale value S. 

It will be seen, however, that this scheme will not work without 
further modification. Suppose that the record for a particular subject 
should look like Fig. 2. Here we have plotted a polygon for the pro- 
portion of right answers for different scale levels for a hypothetica] 
subject. The polygon represents his supposed performance at the 
different scale levels. The polygon crosses the 50 per cent level 
twice (near the points 5 and 7) so that the procedure is ambiguous 
already in that we should not know which of the two scale values to 
choose as a score. The point that we are really seeking is the scale 
value at which the theoretical curve passes the 50 per cent level, but 
that is not immediately given by the polygon of actual performance, 
It is possible to fit a smooth curve through the points but that requires 
an amount of calculation which is absolutely prohibitive in the rou- 
tine scoring of mental test performance for a large number of subjects. 
The numerical data at the bottom of the graph in Fig. 2 show the 
number of right and wrong answers and the total number of questions 
asked at each step of the scale in the hypothetical performance. 

There is another way of defining the point S in Fig. 1. Note that 
if the distribution of gains represented by the curve FR is symmetrical, 
the two cross hatched areas will be equal. They represent the sum- 
mation of proportions of correct performances above S and the cor- 
responding summation below S. Since approximate symmetry can 
be assumed fairly safely, we shall define the score, S, as that point on 
the scale at which the sum of right answers above it equals the sum of 
wrong answers below it. 

If the summation of right answers were made below the point S, 
or the summation of wrong answers above that point, the score would 
be directly affected by the starting and terminal points on the scale. 
In fact, the score would be indeterminate. This difficulty is not 
encountered, if the score is defined as that scale value above which 
there are as many correct answers as there are wrong ones below it. 

If this scoring method is applied to the hypothetical performance of 
Fig. 2, it results in a plot such as Fig. 3 in which the two curves repre- 
sent the summations of right and wrong answers, starting at the top 
of the scale for the summation ZF and at the bottom of the scale for 
the summation ZW. 

An essential feature of Fig. 3 must be noted in determining the 
scale value at which the two curves cross. Each ordinate on the curve 
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R represents the summation of right answers down to the lower edge of 
the class interval considered, whereas each ordinate of the curve W 
represents a summation of wrong answers up to the upper edge of the 
class interval considered. These ordinates cannot, therefore, be 
plotted over the midpoints of their respective class intervals on the 
scale. ‘This can be verified by comparing the hypothetical data under 
Fig. 2 with the ordinates of Fig. 3. 

Having adopted a principle for determining the individual score, 
it remains to derive a simple formula by which the score for each 
subject may be determined by a minimum of arithmetical labor. 
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The score for the hypothetical performance that we are considering is 
the intersection of the two curves in Fig. 3. Let R, and W, be the 
summations >R and 2W at the bottom (6) of the class interval in 
which the two curves intersect. Let, similarly, R. and W:2 be the 
corresponding summations at the top (7) of the same class interval. 
Lex x represent the fraction of the class interval between the score S 
and the bottom of the class interval. Then it can be shown that 


Wi + x(We —_ Wi) = R. + (1 — r)(R; —_ Re) 
which reduces to the formula 
R, — W;, 


*" 1 - Er. -Wo 
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The detailed application of the method is shown in Table J for our 
hypothetical subject. It will be seen that the formula is excee 


bject dingly 
simple in application and the numerical values are small a 
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be easily handled in a routine manner. The calculation of the indi- 
vidual score is as follows: 


R, = 9 W, = 5 
R, = 3 W, == 8 
Hence 
4 4 
nt ee iene 
or 
Score = 6.44 


The scoring method just described has several directions of freedom 
in its favor. It is possible to omit several test questions without 
seriously affecting the individual score. It is not necessary to have 
the same number of questions at each level of difficulty. The score is 
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unaffected by the starting point or the terminal point on the scale for ‘4 
each subject. The method also makes it possible to use a scale in f 
which the class intervals are not equal, in case that is desired. 

It is possible to extend this scoring method still further so as to 
allow even more freedom in test construction. The procedure so far 
described takes care of all of the seven requirements except the third. 
This requirement stipulates that it should be possible to include in Ms 
the same scale two forms of test question, namely (a) those that are 
scored either right or wrong by the all-or-none principle, and (b) those 
that give a variable score such as time in seconds or number of right 
answers or number of errors. 

Suppose that when a test is standardized the average or median 
scores for 7-, 8-, and 9-year old children are found to be 30, 60, and 80 
respectively. Each one of these levels of achievement may be con- 
sidered as a separate task which is either passed or failed. If the 
score of a particular child on this test is 75, he would be marked as 
passing the two test levels for 7 and 8 years but not for the 9 years. 

In this manner any test that gives either an ascending series of scores 

with age or a descending series with age, such as time consumed, may 

be divided into steps determined by the average performance of 

children of the successive age levels. In the above example, the child 

would be scored with two pass marks and with one failure from the | 
single score of 75. A test that is scored only right or wrong should 
be allocated to that age at which half of the children pass it. In this 
manner consistency is attained in combining into one scale two types 
of test, namely (a) those that give only right or wrong answers, and 
(b) those that give a variable score. The latter type of score would be 
marked as pass marks for all age levels below the score and as failures 
for all age levels above the attained score. 

In Fig. 4 is presented a detailed method of handling such data in 
the form of a single scale. The figure represents the record sheet for 
an individual examination. Most of the items can be printed on the 
form so that the recording of an individual performance would consist 
mainly in making check marks or rings and a simple summation at 
the bottom of the record sheet. 

Let the first column represent a list of test elements (fictitious 
in this case) for the purpose of illustrating the method. Some of 
these tests may be such as occur in the Binet test series and which 
are scored either right or wrong. Some of the tests may be such that : 
they give a variable score. In the second column indicate by check 
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marks the tests that are omitted by the examiner or refused by the 
subject. In the third column record by check marks the tests that 


test e/erments 
Omitted or 
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Refused 
Supraliminal 


SCALE VALUES 
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are so difficult that the subject did not succeed in attaining the lowest 
acceptable score in them. 

The scale values from 3 to 16 in this example are indicated at the 
top of the schedule. In the appropriate squares are recorded the 
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median scores attained by children of the designated ages. For 
example, on test 1 the median score for children 3.5 years old is 12 and 
for children 4.5 years old, it is 21. These median scores should be 
printed on the schedule. If the particular child in our example should 
score, say, 26 in test 1, the examiner merely rings the two printed 
median scores, both of which are exceeded. 

In test 12, for example, the hypothetical performance was 75 and 
the examiner therefore rings the two printed median scores of 30 and 
60 but not 80. In some of the tests the performance is a descending 
series with age as when total time for a form board is recorded. Let 
test 5 represent such a test and let the performance of the particular 
subject be 15. The examiner then rings the three printed median 
time scores that are slower but not the one printed time score that is 
faster. 

It is noticed that in Fig. 4 the age range varies for the different 
tests. The age range that is used in a scale should extend as far as 
sharp differentiation obtains, but no farther. Thus, if on question 
13 the 12-year old children have a median score of 80, or only slightly 
above 80, their median score will not be used in the scale because the 
test does not sufficiently differentiate between the 11- and the 12-year 
levels. In this way the scale is not obscured with a non-differentiating 
load. 

Question 24 is checked in the supraliminal column which 
indicates that while the test was given to the particular child of 
this schedule, the child was unable to attain the lowest acceptable 
score, 5, in this test. ‘Test 14 is checked in the same way thus indi- 
cating that the child was unable to attain the lowest acceptable score, 
45, in this test. A descending score like that of question 14 repre- 
sents either the number of errors in a task or the number of seconds 
required to perform the given task. 

All questions that are scored by the all-or-none principle are indi- 
cated by an X at the age level to which the question has been assigned. 
Every such question is allocated to that age-group at which 50 per cent 
of the children fail on it. In this way the procedure of age allocation 
for the two types of test is consistent in that for each age level one 
should expect half of the children to fail on every right-wrong question 
at that level as well as on every median test score assigned to that 
age group. For the right-wrong questions, as for the variable-score 
questions, a pass is indicated by a ring, and a check mark in the 
supraliminal column indicates entire failure to score on the test. 
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Before proceeding to determine the mental age, or scale allocation 
of the child, the examiner crosses out the printed scores for then 
tests which were omitted. In the example of Fig. 4 these are the test 
questions 4, 9, 10, 17, 20. He then adds up the number of rings 
(passes) and the number of failures in each column. These sums are 
indicated in the two rows # and W at the bottom of the schedule. 
The summation for the number of right answers, =R, proceeds from 
the top of the scale, while the summation for the number of wrong 
answers, 2W, proceeds from the bottom of the scale as indicated in the 
two bottom rows of the schedule. It is not necessary to complete 
these two summations over the whole range. They need be carried 
only far enough to ascertain the point at which the two curves 3R 
and =W cross. By inspection it is seen that the highest scale value 
at which >RF is greater than ZW is 9 and that scale value is therefore 
ringed at the top of the schedule. The increment, x, above 9 in the 
individual score is determined by the formula described above. The 
calculation is as follows: 

6—3 3 


Co ee ee ta 


x 


Hence 
Score (or mental age) = 9.75 


The method just described solves the problem of ascertaining an 
individual score under the seven conditions. The method does not 
require the same number of test elements at each step on the scale. 
It is possible to omit test questions at the discretion of the examiner, 
or by the refusal of the subject without increasing the labor of calcu- 
lating the individual score. This method makes it possible to com- 
bine into one scale a wide variety of forms of mental or educational 
tests, including the right-wrong type of question, questions scored by 
the amount of time required to perform a given task, questions scored 
by counting the number of errors, or by counting the number of right 
answers, or by counting the amount done in a given time. In fact the 
method here proposed makes it possible to include in a single unified 
scale an almost unlimited variety of forms of objective test question 
or task. 

It should be noted that the hypothetical example chosen for the 
purpose of explaining the method has much more scatter of perform- 
ance than would ordinarily be found in any educational scale or psycho- 
logical test, but that has been intentionally done to make the example 
perhaps more effective. 
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The basic principle here adopted for the purpose of scoring indi- 
vidual performance is that the individual score should be that point 
on the scale above which there are as many right answers as there are 
wrong answers below it. By an extension of this principle to the 
variable-score tests it is possible to include a wide range of test forms 
into a single unified scale. Theoretically, we should really score 
individual performance by locating a point on the scale so that the 
summation of the proportions of right answers above that point is 
equal to the summation of the proportions of wrong answers below 
that point. These proportions should, further, be weighted by the 
average scale distance of the two adjacent class intervals, in case 
the class intervals on the scale are not uniform. But this procedure 
is prohibitive because of the arithmetical labor involved for every 
individual subject. The method here proposed is a workable compro- 
mise which will probably serve all practical purposes. 








VARIATIONS OF THE PRODUCT-MOMENT 
(PEARSON) COEFFICIENT OF CORRELATION 


PERCIVAL M. SYMONDS 


Teachers College, Columbia University 


This paper presents 52 variations of the product-moment (Pearson) 
coefficient of correlation, develops and applies principles for choosing 
which of these formulas to use under different conditions, discusses 
printed forms and short cuts for computing these coefficients on a 
large scale, and attempts to trace the historical development of these 
variations. : 

The 52 variations follow. To the mathematician some of the 
variations are so similar as to be hardly worth rewriting, but each 
formula really represents a different rule of computing procedure. 
The symbols used have the following meanings: 

r = product-moment coefficient of correlation. 
> = sum of. 

x and y = deviations from the mean of the two variables being 

correlated. 
o, and oc, = standard deviations of the two variables. 

b, and b, = regressive coefficients of the two variables. 

x’ and y’ = deviations from an assumed mean. 


X and Y = gross scores. 
M, and M, = mean of the two variables. 
N = number of cases. 
_ ary 

r Nia, (1) 

Typ? 

This may be changed by noting that o = ae 

Lry | 

oe .. a Y 
. Lr24/ Ly? @) 
r= bb, (3) 


No. 4 is the formula when the mean is assumed at a point distance ¢ 
from the true mean: c = 2’ — gz. 


La'y’ — Nezy (4) 
T= one cs ar A A 
VJ dx"? — Ne,2/ dy"? — Ne? 
The next three are variations of No.4. Each may be derived from 
the preceding by multiplying by NV. 
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Nos. 8, 9, 10 are for the case where the assumed mean is at zero. 
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Nos. 11 and 12 are variations of 8 and 9 where M has been sub- 
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The next group is derived from No: 1 by the relationship 
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Nos. 15, 16, and 17 may be derived from 5, 6, 7 by the relation 
ra? + Sy? — Zax — y)? 
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Nos. 18, 19 and 20 are the same as the three preceding where the 
mean is assumed at zero. 
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Nos. 21 and 22 are variations of 18 and 19 with M substituted for 
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Nos. 23 and 24 are useful when the o’s and M’s are known but one 
wishes to work with differences in gross scores. 
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No. 25 may be derived by setting o, = o, in No. 13. 
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No. 31 is similar in structure to No. 13 and is derived from the 
relationship 
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Se + y')? — Ex’? — dy] — =2'y! 
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No. 49 may be obtained by eliminating o from 25 and 43 


2 
O'(2+4) o7(s~y) 
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r(X + Y)? + 2X — Y)? — 2NM? M. M, 

General principles to be observed in choosing formulas for product 
moment coefficients of correlation: 

1. Work with deviations from an assumed mean rather than with 
deviations from the true mean. 

Formulas 1 and 2, 13 and 14, 31 and 32 are of theoretical value 
only. In actual practce no one thinks of taking deviations from the 
true mean. Usually one assumes a mean with an integral value in 
order to save computation, preferring to apply such corrections as 
are found in formula 6. 

2. The nearer the assumed mean is to the true mean the smaller 
the numbers in the computation. 

For ease of computation and for general practice one would choose 
such formulas as Nos. 5, 6, 7 in preference to Nos. 8,9, or 10. Form- 
ulas 8,9, and 10, using gross scores, run into larger numbers. Toops 
avoids this by transmuting all scores into derived scores with a maxi- 
mum of 20. This transmutation process, even though facilitated by 
the use of a table, is an extra step which is time-consuming. One 
would only use formulas 8, 9, or 10 if he had a calculating machine 
available. With such a machine available the work may be easily 
executed provided N be not too large. But in general for ‘“‘hand”’ 
work the formulas involving 2’ and y’ are preferable. 

3. As between the product moment formulas 1 and the difference 
formula 13, or between the formulas which one would use in computing 
6 and 16, it is hard to make a distinction. As far as my analysis goes 
it is practically as easy to get the sum of the product moments Y2’y’ 
as to get the sum of the squared differences 2(z’ — y’)*. Granting 
this, the product-moment formula 6 is easier to compute because it 
involves two steps less. 
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4. As between formulas 5, 6, and7,I prefer6. Formula 5 involves 
five division operations and two square operations. Formula 6 
involves three division operations and two square operations. Formula 
7 involves three multiplication operations and two square operations, 
Choice between 6 and 7 resolves itself into a question of multiplication 
vs. division. With a calculating machine or tables multiplication js 
to be preferred and hence 7. With a slide rule I prefer division 
to multiplication and hence 6. However if one wishes to get the 
o’s and M’s as well as r 5 should be used as the two radicals in the 
denominator of the fraction are the two o’s respectively, and the two 
terms = - and *¥ are corrections for finding the mean from the assumed 
mean. 

For all single correlations I recommend the use of formula 6, 
Where N is less than 30 it may be easier to do the work directly in 
columns, finding the means by adding and dividing by N, and taking 
deviations from the nearest integral value to the mean. Where N is 
over 30 it is easier to distribute the scores and work from a two-place 
scatter graph. 

Where a computing machine is available it is oftentimes simpler 
to use formula 10. 

For very easy work, that is where N is less than 15 and where the 
values of the various variable are very small, 7.e., under 7, I recom- 
mend the use of formula 19. 

In case one has an intercorrelation problem, I prefer formula 23. 
Suppose one has n variables for which one wishes to find all of the 
intercorrelations. In such a case there are several constants such 
as Dz, Ly, Dx? and Ly? each of which is used in (n — 1) of the 
correlations. They may be found once for all and may then be 
used in each of the n — 1 correlations involving them. Since they 
are to be found once for all it is probably easier to find them from 
assumed means near the true mean rather than from assumed mean at 
zero. On the other hand the sum of product moments or the sum of 
squared differences must be found separately for each correlation 
coefficient. Itmay be more advantageous, therefore, to find them from 
the gross scores. Formula 23 or 24 fulfills these requirements and I 
believe that intercorrleations may be found most economically by 
their use. 

There are exceptions tothe above recommendations. If one wishes 
to find the M and o of the variables as well as the coefficients of cor- 
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relation one will use 5 rather than 6, 8 rather than 10, and 18 rather 
than 19. Then there are special formulas to use where the o’s are 
equal (formula 27 is recommended) or when both the o’s and M’s 
are equal, 28. 


PRINTED ForRMS 


Printed forms for the facilitation of computing coefficients of 
correlation have been constructed by Ruger, Toops, Otis, Kelley, 
Thurstone, Holzinger, and Ruch-Stoddard. 

Ruger’s form is merely a blank graph with the basic product- 
moments printed in each square. Any product-moment formula 
may be used with this chart. 

Toops has devised a chart which leads to computation by formula 
20. This is a variation of the difference formula, taking all deviations 
from zero (7.e., using gross scores). Toops avoids large numbers by 
transmuting all original data by means of a table. 

Otis has devised a form which computes the correlation coefficient 
by the way of formula 16. This is also a variation of the difference 
formula 13. Otis’ method differs from Toops’ in that Otis uses 
assumed means near the true mean whereas Toops uses assumed 
means at zero. ‘Toops’ is to be preferred when machines are at hand 
and one uses clerical assistance. Otis’s is the better of these two for 
ordinary hand work. 

Kelley has devised a chart which uses what is essentially formula 5 
although the o’s are found in the process of computation and are 
inserted before the final computation of r. Kelley has a unique set 
of steps which checks every step of the work according to the following 
relations: 


za’ _ 2a’ + y') + 2a’ - yy’) 














N 2N 
zy _ tz +y’) — 2r' -y’) 
N 2N 
U7 \2 ,  as’\2 
alt + ty? = Se te 
, . ,  ar’\2 


The values 2(2’ + y’)2(2’ + y’)*, Bla’ — y’) and Za’ — y’)? 
are found by taking the sum of diagonals from lower left to upper right 
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(x — y) and from upper left to lower right (x + y) of the scatter. 
The thorough-going statistician demands some such independent 
check on a piece of computation. However, most persons will not 
take the time to perform the added labor in order to guarantee that 
their work is correct. 

Thurstone has devised a form which leads to computation by what 
is approximately formula 5. The unique feature of Thurstone’s 
chart is his analysis of the steps in finding the sum of the product- 
moments. 

Holzinger has prepared a computation form which uses what is 
essentially formula 6. Holzinger has enlarged the scope of his form 
so that 7, the correlation ratio, may be computed at the same time. 

Ruch-Stoddard have prepared a chart with formula 5 as the basis 
of computation. 

Of these seven printed forms for computing the product-moment 
coefficient of correlation I prefer that of Thurstone. This chart is con- 
veniently arranged and its computation procedure is simple. I, 
personally, am favorably disposed to Thurstone’s method for finding 
the =2z’y’ which is the one step in the procedure by any method that 
causes the most difficulty. If one objects to paying the high price 
that C. H. Stoelting charges for this form, the Ruch-Stoddard form 
will serve as a good second choice. 

Two machines have been invented for lightening the burden of 
computing the coefficient of correlation by performing some of the 
summing operations. 

Hull has invented a machine which computes 2X, LY, LX? 
DY? and TXY, leading to computation by means of formula 10 or 
formula 8. 

Dodd has invented a machine on a different principle which also 
yields the constants N, 2X, ZY, =XY, =X?, and LY? so that either 
formula 8 or 10 may be employed. 


HistToricAL DEVELOPMENT 


The theory of correlation was developed by Bravais, Galton and 
Edgeworth. Person was the first to give the formula algebraic 
expression and the product moment coefficient is commonly called 
the Pearson coefficient. The formula 1 is first found in Pearson’s 
paper ‘Regression, Hereditary and Panmixia’”’ published in 1896. 
There is evidence that this article or at least the essential parts of it 
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was in manuscript for some time before its formal publication. Pear- 
son refers to the formula himself in a brief note in 1895. Yule appears 
to have used the formula in 1895 and refers to Pearson’s manuscript. 
Yule was the first to give the formula using deviations from an assumed 
mean, 4, in 1897. The formula for deviations from zero, 8, 9, 10, was 
first published by Harris in 1910, although it is in reality but a special 
case of the formula from an assumed mean. This formula from an 
assumed mean at zero seems to have had three independent discoveries 
—one by Harris in 1910, one by Thurstone in 1917, and by Ayres in 
1920. Even now the formula is sometimes referred to as the Ayres 
formula because of the publicity given to it in the opening issues of 
the new Journal of Educational Research. This is a good example of 
scientific men doing research without endeavoring to find out what has 
been done in the same field in the past. 

The difference formula, 13, was given by Pearson in the same 
article in which he gave the basic formula, 1. The difference formula 
was also discovered independently by Boas in 1909. But this did not 
pass uncensored by Pearson who the following year printed a cutting 
rejoinder criticizing Boas for so carelessly claiming credit for originality 
without taking the trouble to ascertain previous work done in the field. 
The formula, 25, which is a special case of the difference formula when 
the two standard deviations are equal was given by Harris in 1909. 
To Chapman (1919) belongs the credit for the first derived forms of 
the difference formula and the use of the device for summing diagonals 
to find the sum of differences. 

Credit for the sum formula, 31, and the combinations of the sum 
and difference formula, 49, 50, belong to Kelley as I can find no state- 
ment prior to the publication of Kelley’s Statistical Method. 

Formula 23 was first devised by Huffaker. 
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ACCURACY IN SCORING GROUP INTELLIGENCE 
TESTS 


RUDOLPH PINTNER 


Teachers College, Columbia University 


Group tests, as compared with individual examinations of the 
Binet type, are in general much more objective in their methods of 
scoring. We are apt at times to feel that they are so objective as to 
warrant little or no training on the part of the scorers. We frequently 
see it stated that any clerk may be employed to score tests. The 
need for as much objectivity of scoring as possible was clearly obvious 
in the army testing, where men without any training in psychology 
had to be used as scorers. It is from this period that our so-called 
“fool-proof”’ methods of scoring date. 

To what extent are our methods of scoring ‘‘fool-proof’’? To what 
extent is there room for difference of opinion or divergence in the scor- 
ing of our group tests? How much emphasis must be given to this 
matter of scoring in the training of students in intelligence testing? 
Such questions have been brought to the writer’s attention in the 
process of training students to administer intelligence tests. That 
there are difficulties and ambiguities is obvious from the questions 
raised by the students in the course of their work. To obtain some 
measure of these difficulties it was proposed to experiment with the 
National Intelligence Test, which is a good example of our present 
group intelligence tests. The scoring of the N. I. T. is more objective 
than that of many other intelligence tests in constant use. There are 
only a few tests more objective in their method of scoring. 

An experimental blank of the N. I. T., Scale B, Form 1, was pre- 
pared by going through a great number of actual test blanks filled out 
by children. All kinds of peculiar responses made by children were 
incorporated in this blank. Samples of such responses follow: In 
Test 1, omission of decimal point in several items; omission of qualify- 
ing descriptions of answers, such as hrs., lbs., $, and the like; use of 
decimals for fractions; failure to reduce fractions to lowest terms, 
etc. In Test 2, the writing in of correct responses; the underlining of 
two or more words or of a word and a half or of other than the response 
words. In Test 3, writing in of responses; underlining both responses; 
deleting the wrong response. In Test 4 writing in response; crossing 
out or encircling a word instead of underlining; deleting the three 
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wrong words and leaving right one alone; underlining one and a half, 
two or more words. In Test 5, response by marking all items D was 
used. 

The experimental blank, constructed in this fashion, is therefore 
not similar to any one child’s paper, but is to be considered as contain- 
ing a great number of the scoring difficulties which will confront the 
examiner in the course of his work. By giving this blank as an exer- 
cise in scoring, we may get a measure of the amount of difference of 
opinion that may exist with reference to ambiguous responses and 
also a general measure of the accuracy of the scorers in handling a 
paper, in weighting, adding, and the like. A sufficient number of 
copies of this experimental blank were made in order to provide one 
for each student and the scoring of the blank was made a class exercise. 

Later on a second blank was prepared of the National Intelligence 
Test, Scale B, Form 2, using the same types of ambiguities. The 
two blanks may be considered as roughly of equal difficulty. Several 
groups of students took part in the scoring. The 1923 group scored 
Form 1 only. The 1924 and 1925 groups scored Form 1 in October 
and Form 2 in January. The groups were given the first test before 
they had had any discussion of group testing or any practice in test- 
ing, except such as they had received previous to enrollment in the 
course, and in the case of a few students such previous practice was 
considerable. The class is composed of graduate students in educa- 
tion. Most of them have had experience as teachers or supervisors, 
and several have held positions as school psychologists. 

The experimental blanks were also scored by the writer and his 
assistants. Ambiguities were discussed by them and the most reason- 
able interpretation of the directions for scoring was taken. In this 
way a “correct’’ score was arrived at, but the writer is of course 
aware that there is still room for a difference of opinion on several 
points. 

The Distribution of Total Scores.—The total scores obtained by 
the students differ from each other because of the different interpre- 
tations given to scoring directions, because of errors in scoring, and 
also because of errors in weighting and adding. 

The ‘‘correct”’ score for Form 1 was 66, and for Form 2 was 74. 
The general tendency seems to be to score too severely. There is an 
alarming range of scores given to the same paper, but we must remem- 
ber that this is not an ordinary paper such as the average child would 
turnin. The total range of scores on Form 1 is from 3 to 85. Omit- 
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Scores ALLOTTED BY STUDENTS 
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Form 1 Form 2 
Groups Groups 
Total Total 
24 25 24 25 

Fi tdd as ene 33-82 | 3-85 | 3-85 | 58-90 | 62-79 | 58-99 
62 61 61 74 73 73.5 

58 58 57 70 71 71 

EE De 54 55 54 67 68 67 
ome. . fyi a 4.0 | 3.0] 3.5 | 3.5 | 2.5 | 3.95 

Pile satel bk eed eas 46 42 138 42 38 80 























ting two “‘freak”’ scores of 3 and 18, the range is from 34 to 85. This 
is equivalent to a range in mental age from about 7 to about 11. 
In Form 2, which was given at the end of the course in mental testing, 
the range was not so great, from 58 to 90, or equivalent in mental age 
from about 8-10 to 11. 


The differences between the medians of 


NUMBER OF ERRORS IN SCORING 


scores given by the students and the correct scores are all negative, 
showing the general tendency of the students to score too severely. 
At the beginning of the term the difference is —9, and this is reduced 
to —3 for the students tested at the end of the term. Practice and 
discussion of difficulties and ambiguities evidently has reduced the 
discrepancy. The slightly lower semi-interquartile range for Form 2 
may also indicate closer agreement among the students after training. 

The Number of Errors.—Errors are to be considered as differences 
in interpretation of scoring items or actual errors due to oversight, 
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Form 2 
| Groups 
Total - Total 
| | a | 25 
0-40 1-14 1-17 1-17 
16 8) 10 9 
12 6 6 6 
i) 4 4 4 
3.5 2.5 3 2.5 
138 42 38 80 
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mistakes in subtotals of tests, mistakes in weighting and adding sub- 
totals to obtain total scores. 

The general decrease in number of errors in Form 2 over Form 1 is 
apparent. The total range has been considerably reduced, and the 
reduction of Q shows that the students agree among themselves better 
at the end of the term. Little supervised practice in scoring N. I. T. 
blanks had been given in the interval. The improvement must have 
been due to the discussion of the results of the experiment with Form 
1 which was used as a means of bringing up general principles of 
scoring. In addition to this, National Intelligence and other group 
tests had been given and scored by the students in connection with 
the several projects undertaken by the class. 

Discussion of scoring had undoubtedly taken place among various 
groups of students working on the same project. Improvement in 
certain gross errors in the 1924 Group may be noted from the follow- 
ing: 




















Form 1 Form 2 
EE PPE PT CETTE TT rere Tere re 4 0 
Errors in getting sub-totals of tests.................... 7 8 
Neglect to multiply by weights........................ 1 1 
Exrors in R-W computation.............ccccccccccsces 2 1 
Crediting stereotyped response....................-24- 14 0 





There is distinct improvement in all types of error with the excep- 
tion of computing sub-totals of tests. The last type of error, what is 
called above “crediting stereotyped response,”’ is due to forgetting 
the rule that, when a child uniformly marks one of two alternatives 
right down the page, he is to be credited with zero for the whole test, 
rather than be given credit for such items as he may have marked 
correctly by chance. 

Ambiguous Responses.—The type of response in which there was 
greatest disagreement between the students as a whole and the writer 
was the case where the child in Test 3 or Test 4 deletes the wrong item 
or items and allows the correct item to stand unmarked, instead of 
underlining the correct item. The writer credits such methods of 
indicating the answer by the child as falling under the general rule, 
“Any clear method of indicating answer is given full credit.” A 
difference of opinion may well exist as to whether this method is 
“clear” or not. Many students argued emphatically that it was not 
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“clear,”’ and it is certainly true that the distinction between deleting 
and underlining is not great. The next largest difference of opinion 
between the ‘correct’? score and the students’ marking was in an 
item where the child in underlining a word had run on and included 
about half of the next word. This the writer interpreted as a slip 
of the pencil, whereas the students in general were inclined to interpret 
it as an underlining of the two words. Some of the large differences 
between the writer and his students were due to failure on the part of 
the students to note carefully some of the minor rules of the scoring 
key and to trust to their own judgment instead. 

Influence of Previous Practice in Scoring.—The students were asked 
each time to estimate the number of N. I. T. blanks scored and also the 
total number of standard test blanks, educational and mental, pre- 
viously scored. From these rough estimates we may obtain some 
idea as to the influence of previous practice in scoring on the accuracy 
of the scoring of the experimental blanks. 


AVERAGE NUMBER OF ERRORS ACCORDING TO N. & A BLANKS PREVIOUSLY Scorep 





| Form 1 Form 2 

















Number of blank 
RE ee ea Average |Numberof; Average | Number of 
| errors students errors | students 
id Pe inccc eeebbt 11.5 | 23 
EMER... cs cscscvcseceeeeds 2.8 | 12 “> | = 
iivctecnegd wade wddes da 13.3 45 5 0 93 
tak Co ay dine Paihia odeeeind 13.5 59 ; 

















There is evidently no ieee lee: in errors accompanying pre- 
vious practice in scoring N. I. T. blanks. Those who had previous 


AVERAGE NUMBER OF ERRORS ACCORDING TO ALL STANDARD TESTS PREVIOUSLY 


























SCORED 
Form 1 | Form 2 
Number of blank | t Vor 
se ee | Average | Numberof! Average | Numberof 
| errors students | errors students 
; sisiehinnasinnsiendneneieiitbanitebantigpicine aie — _ ar 
1000 and over................5. 12.3 20 | 7.8 25 
| EAT 14.2 21 | 6.2 50 
i nick ie oh ikd tard wis sis / 99 | 20 | | 
None.. | 12.9 | 25 | | 
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to the experiment scored few or none were almost as accurate as those 
who had scored many. 

Practice in scoring papers in general would seem to have no notice- 
able influence upon accuracy in scoring N. I. T. blanks. It would 
seem as if the scoring of standard tests in general might tend to form 
scoring habits and that these persist and are not modified with practice, 
unless particular attention is drawn to them. This would point to the 
necessity for ‘‘cooperative scoring,’ the checking of other students’ 
work, and the discussion of differences, because we have seen that 
where this occurs the number of errors is greatly decreased. 

Summary.—An experimental blank made up of many doubtful and 
ambiguous responses shows a wide range of scores allotted by students. 
The deviation from the ‘‘true score” is very great. Discussion of the 
principles of scoring and the checking of each others’ papers during a 
term’s work on intelligence testing tend to reduce these deviations 
considerably. Previous practice in scoring tests seems not to be 
related to accuracy in scoring this particular experimental blank. 
Those who had previously scored many tests, made just as many errors 
as those who had scored few or none. 

The great differences between scorers indicate that our scoring 
keys and directions to scorers contain a great many ambiguities. 
If this is the case with the National Intelligence Test, where the 
objectivity of the test is great, it will be truer of other tests that are less 
objective. There is great need for tests with more objective scoring. 

The conditions of the present experiment are artificial in the sense 
that no actual child’s paper would ever contain all the doubtful 
responses which were gathered together in the experimental paper in 
question. There will be much less deviation from the true score in 
the average child’s paper. 

The facts in the present article show that it is necessary to train 
workers in group testing if we are to expect agreement in scoring 
methods, and furthermore, that mere experience in scoring group 
tests is no guarantee of accuracy. 





FURTHER STUDIES ON THE RELIABILITY OF 
READING TESTS 


W. F. CURRENT AND G. M. RUCH 


University of Iowa 


Introduction —Two major studies have appeared to date on sta. 
tistical aspects of certain reading tests.' It is the purpose of this paper 
to do two things: (a) bring these earlier studies up to date by adding 
certain reading tests published more recently, and (b) supply data on 
the reliability of certain tests in a more useful form for interpretive 
purposes. The value of Gates’ paper is lessened somewhat by the 
fact that he did not report the standard deviations of the actual read- 
ing scores. Thus he prevented the calculation of probable errors of 
test scores and other measures of more utility than reliability coeff- 
cients. Monroe reports means and standard deviations. He also 
gives the ‘‘probable errors of measurement,” together with certain 
derived measures; particularly the PE,;/Average, a measure which 
has been called in question.? 

Monroe’s findings permit the calculations which the present 
paper holds to be useful although, due to his arrangement of tables, 
it has caused the present writers some misgivings as to whether they 
have paired the r’s and sigmas correctly in bringing together constants 
from his different tables. 

Since the work of Gates and Monroe a number of new reading tests 
have appeared; three of which at least appear to have more than 
usual value the Haggerty Reading Examination, the two reading tests 
_of the Lippincott-Chapman Classroom Products Survey Test, and the 
Stanford Reading Test, These have been included in the present 
study along with the Courtis, Monroe, and Thorndike-McCall tests 
in order to establish common ground with the two older studies in 
question. 





1 Gates, A. I.: An Experimental Study of Reading and Reading Tests. Journal 
of Educational Psychology, Vol. XII, 1921, pp. 303-314, 378-391, and 445-464. 
Monroe, W. S.: A Critical Study of Certain Silent Reading Tests. Uni 
versity of Illinois Bulletin, Vol. XIX, No. 22; Bureau of Educational Research, 
Bulletin No. 8, 1922. 
2 Franzen, R. H.: Statistical Issues. Journal of Educational Psychology, Vol. 
XV, 1924, pp. 367-382. 
Ruch, G. M.: Minimum Essentials in Reporting Data on Standard Tests. 
Journal of Educational Research, December, 1925, pp. 349-358. 
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Subjects and Method.—One hundred fifty-four children distributed 
approximately equally throughout Grades IV to VIII were used. 
Three small school systems were represented in order to broaden the 
sampling. The results are treated as a unit in the interests of reducing 
the probable errors of the determinations and to avoid the averaging 
of coefficients of reliability as Gates was forced to do. The testing 
was distributed over a period of three school weeks so that an interval 
of a day between testings could be allowed. 

The Tests Studied.—The tests and the order of succession of tests 
and forms were as follows: 

Monroe’s Standardized Silent Reading Test, Test II, Form 1. 

The above Form 2. 

Courtis Silent Reading Test No. 2, Form 1. 

The above, Form 2. 

Stanford Reading Test, Form A. 

The above, Form B. 

Thorndike-McCall Reading Scale, Form 1. 

The above, Form 2. 

Lippincott-Chapman Reading Test. 
10. Haggerty Reading Examination, Sigma 3, Form A. 
11. The above, Form B. 

Every paper was scored twice independently. Definite rules were 
drawn up for scoring responses in tests not completely objective. 
Answers of the yes-no type were read aloud to the scorers. 

Treatment of Results ——The reliability coefficients, except in the 
ease of the Lippincott-Chapman test, were figured by the correlation 
of the two forms. In the one case excepted, the method was the cor- 
relation of ‘‘odds” and “evens” “stepped up” by the Spearman- 
Brown formula. 

Table I summarizes the reliability coefficients, together with all 
other pertinent data. Two measures of the reliability of individual 
scores are given; (1) the probable error of a raw score,' and the prob- 


SPM rrr> 





1 This measure is what Monroe terms the probable error of measurement, i.e. 





PEy = .6745 — oo ./1 —Ti3. Kelley writes the same formula as PE;.. = 


6745 «4/1 —1,,. The second statement has one important advantage in that it 


ealls attention to the fact that the meaning of this probable error is ‘‘the probable 


error of estimate of a fallible score estimated from a second fallible score con- 
sidered as a true score.’”’ 
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able error of a true score. These are expressed, in turn, as fractions of 


the standard deviation. Following Kelley’s recommendation,! the 


P. E. @el 


ratio is held to be the best single statement for the reli- 


ability of a test. 


. PE. . ' 
The ratio, — , ls not as satisfactory as the preceding ratio due 


to the fact that the numerator term does not allow for regression effects 
in the scores. As is well known, the probable error of a raw score is 
difficult of statement because the error of an obtained score differs 
in different parts of the range of scores, being larger for scores deviating 
large distances from the mean than for scores near the mean. Kelley’s 
use of estimated true scores, 7.e., regressed scores, obviates this objec- 
tion to Monroe’s recommended measure. 

However, the probable error formula involving the +/1 — r term 
has been used a great deal and does possess considerable meaning. 
For this reason, both sets of determinations have been reported in 
Table I. 

The last column of Table I is suggested as the most meaningful 
expression of these reliabilities. It will be noted that a marked range 
of values was found, viz., .17 to .31. In words, the range of probable 
errors of estimated true scores is from .17 of one SD to .31 SD. The 
most reliable test has a probable error roughly one-half that of the 
least reliable test in the list. 

Present results cannot be compared directly with those of Monroe 
and Gates since but two of the tests are common to all three studies 
(the Monroe and Courtis tests). On the whole, our results show some- 
what greater reliability for the Monroe tests than was found by either 
Monroe or Gates. Monroe found higher reliability for the Courtis 
tests for both rate and comprehension than did either Gates or the 
present authors. These comparisons of the three studies are not worth 
much consideration since grade ranges are not comparable. 

The fact that the three most recently published tests all stand well 
above the older tests may be significant. However, the recent 
tendency is to make the reading tests longer in point of actual working 
time as the following tabulation shows: 


1 Kelley, T. L.: Statistical Method, p. 215. Kelley actually writes o instead 
of PE in the numerator but the interpretation is the same in either case. The 


PE , is the probable error of a true score when estimated by Xo. = ru X1 
(1 fea rit)M;,. 
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Estimated r 
Test T Time, for constant 
minutes! time of 30 
minutes 
EN eh et eo te awe wre .93 40 .92 
I ee SL Us swe Be .89 38 87 
ES ne ee .75 30 .75 
ee ee ete ibe dk bkéne 6444040 0% .83 28 .84 
Courtis: 
I obo wi Fak wth oda ewebaee VS 74 5 94 
EE ee ee 77 5 96 
Monroe: 
a ts oe ee whee ce eee a .76 4 .96 
ee Oe oe ee BEA a ads S we Rae 71 4 .95 














The estimated r’s for 30 working minutes 


use of Spearman-Brown prophecy formula. 
limited credence to be placed in such values in the cases of the Monroe 


were obtained by the 


There is, of course, 


TaBLE II.—INTERCORRELATIONS OF COMPREHENSION Scores (N = 154) 












































Tests 1 | 213 4|/5;|}617 8 | 9 | 10/11 
1. Stanford.......... | 
OS eee 
2. Stanford........ .93 
I .010 
3. Haggerty.......... .88 |.89 
Form A........... .011).011 
4, Haggerty .|.88 |.82 |.83 
Form B........... .012} .017| .017 
5. Thorndike-McCall .80 |.78 |.71 |.67 
 , a .019) .021/| .026; .030 
6. Thorndike-McCall...|.71 |.80.|.68 |.70.|.75 
Form 2............ .|.027}.019] .029) .027) .023 
7. Monroe........... .78 |.76 |.79 |.74 |.48 |.61 
Ss See .021) .022/ .019} .024/ .041) .033 
8. Monroe........... .76 |.76 |.75 |.69 |.53 |.55 |.76 
a .023) .023) .023) .028) .038) .037) .022 
9. Courtis............ .61 |.62 |.58 |.57 |.51 |.43 |.60 |.53 
Form 1.............|.034}.033) .036) .036) .039) .044) .033) .038 
10. Courtis............ .64 |.65 |.61 |.52 |.59 |.41 |.52 |.52 |.74 
Form 2 .. «+. +|-032} .031) .034) .038) .035) .044) .039) .039) .024 
11, Chapman.......... .68 |.68 |.69 |.70 |.59 |.58 |.58 |.63 |.53 |.52 |.89 
.029} .029| .028) .027/| .035! .035| .036} .032) .038} .040) .011 
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and Courtis tests since the estimation of reliabilities for the equivalent 
of the averages of 744 and 6 forms, respectively, is bound to contain 
large possibilities for error. The shorter tests (Monroe and Courtis) 
seem to have slightly greater reliability per minute of actual working 
time but, of course, this is not of great practical moment since, after 
all, users of tests ordinarily give a single form of a test and, right or 
wrong, their judgment will be subject to the errors incident to brief 
testing. It is not likely that the average schoolman will be willing 
to give four or five forms of a brief test. Hence, the longer tests do 
have a certain advantage in that they force the user into more reliable 
measurement. 

Table II gives the intercorrelations of all tests used in the present 
study; comprehension scores only being included. 


SUMMARY AND CONCLUSIONS 


1. The present study was intended to bring the earlier studies of 
Gates and Monroe up to date by the inclusion of three recent reading 
tests. 

2. The six tests studied showed relatively wide differences in reli- 
ability when given to the same group of 154 pupils in Grades IV to 
VII, viz., .71 to .93. 

3. Probable errors of estimated true scores were computed which 
showed that certain tests yield errors of measurement nearly twice as 
large as the best tests available. 

4. Recent reading tests are more reliable than the older ones 
probably due to greater length in terms of working time. In view of 
the reliabilities found, four- and five-minute tests are not very satis- 
factory as measuring instruments. Probably 30 to 40 minutes in 
the minimum time needed for accurate measurement of reading ability. 

5. Four or five forms of the shorter reading tests, if given and 
averaged, would yield reliabilities equal to the best tests studied. 

6. Intercorrelations of reading test scores were presented. 

















RELATION OF INTELLIGENCE TO TRAIT 
CHARACTERISTICS 


W. HARDIN HUGHES 
Director of Research and Guidance, Pasadena City Schools 


Pasadena, California 


INTRODUCTORY 


Occasionally we hear a statement to the effect that there is little 
or no relation between “general intelligence’’ levels, as determined 
by standardized tests, and the possession of those traits which are more 
or less desirable for social and individual success. Sometimes we hear 
the equally untenable claim that the “intelligence quotient” is suffi- 
cient for predicting almost any social and personal characteristics 
the individual may acquire. Needless to say, neither of these extreme 
opinions is held by those who have gone most thoroughly into the 
study of tests and other forms of measurement. Most of us, in fact, 
have long since reached the conclusion that there are many highly 
important elements in a person’s make-up not at all revealed by the 
“‘general intelligence” tests. It is equally true, however, that 
certain desirable traits appear most frequently in persons whose 


“intelligence quotients’’ are high, and scarcely ever in those whose 
‘‘intelligence quotients” are low. 


MetTuHop EMPLOYED IN Tuts Stupy 


The data relative to trait characteristics used in this study were 
obtained through an organized plan for the continuous rating of stu- 
dents in the Pasadena junior and senior high schools. The method 
has been described elsewhere.! Briefly the scale used in this connec- 
tion defines in terms of typical behavior each of twelve important 
traits; sets forth the technique of rating individuals in accordance with 
the method of relativity of degree; and provides for the pooling and 
accumulating of ratings so that a composite record is obtained each 
year based on the judgments of all teachers with whom the student 
has carried work. Each student included in the following studies, 
therefore, has been rated independently by five or six different teachers 
in any single year. The Terman Group Tests of Mental Ability were 
used in obtaining the intelligence quotient. 





1 Hughes, W. Hardin: A Rating Scale for Individual Capacities, Attitudes and 
Interests. Journal of Educational Method, Vol. III, October, 1923, pp. 56-65. 
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RELIABILITY OF METHOD 


Any measuring instrument is valuable and reliable just to the 
extent to which it always gives the same or nearly the same results. 
If a scale in the hands of one user registers high but in the hands of 
another low, its reliabilty is slight. For the measurement of physical 
quantities, scales have been devised whose reliability is exceedingly 
high. The yard stick, for example, gives the same results whether 
applied by one competent person or by another. Its reliability is or 
may be practically 100 per cent. If, however, the person who uses 
the yard stick is careless in applying it, or the thing measured actually 
varies in length from time to time the consistency of measurement 
will, necessarily, be reduced. 

Now, in the field of personality measurement, we shall never 
be able to devise a scale that will possess 100 per cent of reliability. 
Even if the scale were perfect, we should find that the traits of person- 
ality are variable. It is human nature for any individual to appear 
differently in unlike situations. In one classroom, for example, a 
student may evidence very superior initiative, while in another class- 
room he is only average. But the situations are different. The sub- 
jects, the methods, the teachers, and many other factors help to 
account for this difference. Consequently, the student should not 
be expected to remain constant. His personality traits are neces- 
sarily variable. 

It has been found, however, that, though variable, a given trait 
fluctuates about a rather constant point of degree. An individual 
may, occasionally, register very high or very low with respect to a 
specific quality but most frequently he will not be found in either of 
these extremes. This fact explains in large measure the comparative 
reliability of pooled judgments. 

With this in mind it will be of interest to note the consistency of 
composite ratings from year to year. In Table I we have shown the 
tendency for students to remain in the same fifth of the rating scale 
although rated by a different set of teachers each year. It will be 
noted that the majority of these students, specifically 84.6 per cent, 
received the same rating or a rating within one-half step from the 
initial rating. These were eighth, ninth, and tenth year students 
most of whom during their first year’s rating were freshmen in a high 
school of 3000 students. 
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TaBLE I.—ExtTENT To WuicH STUDENTS REMAIN IN SAME Part OF SCALE Froy 
YEAR TO YEAR 












































1 plus | 1 step |}4 step| Same |}4 step} 1 step 1 plus 

lower | lower | lower | ratingjhigher |higher higher 
Control of attention......... 1 8 18 | 148 46 | 28 4 
Force of personality......... 0 7 20; 150 56 | 17 3 
Quickness of thought........ 0 8 32] 128 49 | 27 9 
Trustworthiness.......... 2 5 38 | 135 50 16 7 
ee 0 6 30 | 135 50 | 27 5 
Initative-aggressiveness...... 0 8 30} 125 50 | 21 9 
Regularity-persistency....... | 2 6 17 | 135 60 | 23 10 
Respect for authority........| 1 7 32} 130 50 | 25 8 
Sense of accuracy........... | 1 10 20 | 125 60 | 29 8 
Self-confidence..............}| 1 8 17 | 149 48 |} 19 ll 
Cooperativeness.............| 0 8 25 | 127 55 | 32 6 
Group leadership............| 2 | 1 39| 128] 541! 10 9 
| SOM eae ie a 27} 135| 52] 23 7 
PN aii Siondas icnnes | A }3.1 | 10.7| 53.3 | 20.6| 9.1 | 2.7 




















TaBLE IJ.—CorRELATION OF RaTINGS FROM YEAR TO YEAR (253 SrupeEnts) 








Traits Correlation | Probable error 
Control of attention... ........0ccccceee. .64 .024 
ES .63 .025 
GCosteiemans OF GROUGME... 2... ccc cece ccccess .61 .027 
ETRE Ee .59 .027 
Retentiveness of memory................. .59 .027 
Initiative-aggressiveness.................. .56 .029 
Regularity-persistency................... 55 .029 
Respect for authority.................... .55 .029 
CC EE ee rere 54 .030 
RS ee en ees .53 .031 
I Te ee 51 .032 
RES a 47 .033 











In Table II we have given the correlations (Pearson Method) of 


trait ratings for this same group of 253 students. 


It will be noted 


that the correlation from year to year is highest for control of atten- 


tion and lowest for group leadership. 


Incidentally, it may be stated 


that all of these correlations are as high as those ordinarily found from 
year to year for academic grades and several of them are considerably 
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higher. Considering the fact that during the first year of these ratings 
teachers were just becoming accustomed to the use of the scale, and 
that most of these students who were then freshmen were compara- 
tive strangers in a large high school of approximately 3000 enrollment, 
these correlations are as high as should be expected. 

A somewhat closer agreement of ratings from year to year was 
found in a junior high school of 900 enrollment. Table III is for 79 
students who graduated from this school in the spring of 1924. 


Taste IIJ.—CorrELATIONS OF RaTiInGs FROM YEAR TO YEAR (79 StuDENTS) 











Traits Correlation 
INT, 00s os cknaetans¢ de senwe tines tinct 74 
as iw idan shield hwo Ak SK A 71 
is. cscs ee eesneweshe te teahs ckhnes .68 
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The higher correlations here shown may be due partly to the fact 
that the students had been in the school longer and, because of the 
smaller enrolment of the school, were better known by their teachers. 

One may ask whether this high agreement of composite ratings 
from year to year is in any way due to teachers having access to pre- 
vious ratings. An examination of the scores given by individual 
teachers for this same group of students answers the question in the 
negative. Very wide variation of individual ratings isfound. Such 
variation, in fact, is found that would free teachers from any accusa- 
tion of having copied scores. 

In another paper! the writer has shown the correlation of trait 
with trait for a group of 450 high school seniors. While the “fallacy 
of the halo” in such ratings, as pointed out by Thorndike, no doubt 
affects the judgments rendered, our studies show that ratings on 





1 General Principles and Results of Rating Trait Characteristics. The Jour- 
nal of Educational Method, Vol. IV, June, 1925, pp. 421-431. 
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certain pairs of traits have twice as high a correlation as have ratings 
on certain other traits. The correlation of force of personality 
and leadership, for example, was .83 (Pearson’s formula). Regularity. 
persistency and group leadership, in the other hand, showed a correla- 
tion of only .41. If the ‘‘fallacy of halo” completely invalidated such 
ratings, we should not expect to find such differences among correla- 
tions. This fact, however, does not tell us to what extent the “fallacy 
of halo”’ modifies the rating given for any single trait. 


RELATION OF “‘INTELLIGENCE’’ TO TRAITS AND ATTITUDES 


The accompanying figures and tables indicate the extent to which 
students in the various intelligence levels were rated superior, medium, 
or inferior in the traits of our rating scale. Ten hundred and thirty 
students are included in each figure. 

Let us examine Fig. 1. It will be seen at a glance that students 
of 130 IQ or better are very superior in each case as compared with 
those in the lowest intelligence level. For example, 57 per cent of the 
130 IQ students are rated superior in quickness of thought, while only 
5 per cent of the 80 IQ students have this rating. Turning to the 
left side of this part of the chart, we find that none of the 130 IQ stu- 
dents are rated either inferior or lowest in quickness of thought, while 
35 per cent of the 80 IQ students are rated inferior and 17 per cent 
lowest. Examining the order in which the seven intelligence level 
lines are located in the chart, we find that the order is practically 
the same as that given in the legend; that is to say, the higher the 
intelligence of the students the larger is the percentage falling in the 
higher rating levels. | 

A similar fact may be seen in the distributions for each of the other 
traits. In nearly all, the heavy black line representing students of 
130 IQ or better stands to the right of the others indicating the strong 
tendency for intellectually superior students to receive superior or 
highest ratings in the several traits. It will be noted, however, that 
these traits are arranged in the descending order of correlation with 
intelligence—quickness of thought having the greatest correlation with 
intelligence and respect for authority, the lowest. Interpretation of 
these charts will be facilitated if the reader keeps in mind that coin- 
cidence of lines means less correlation and scattering of lines means 
greater correlation. One may easily see, therefore, that certain very 
important traits; namely, cooperativeness, regularity and persistency, 
trustworthiness, and respect for authority have the smallest correlation 
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with intelligence. From the social viewpoint, this is a very encourag. 
ing relationship, since these are traits very essential to good citizenship, 


TaBLe 1V.—SHow1nc Per CENT or STUDENTS IN Each INTELLIGENCE Levey, 
RATED AS INDICATED 


(1030 Junior and Senior High School Students) 
Trait 1.—Quickness of Thought 
















































































IQ L I M S H 
130 up .000 .000 .261 .571 166 
120’s .000 .078 .431 .382 .107 
110’s .000 .112 519 .335 .039 
100’s .013 .186 .522 .258 .020 

90’s .033 .220 .575 .170 .004 
80’s .080 ! .352 .507 .051 .007 
80- .175 .350 .400 .075 .000 
Correlation .42. 

Trait 2.—Retentiveness of Memory 

IQ L I M S H 
130 up .000 .000 .261 .547 .190 
120’s .000 .029 401 431 137 
110’s .016 .083 .402 .446 .050 
100’s .017 .154 .518 .271 .037 

90’s .020 .241 .575 .150 .012 
80’s .110 .227 .536 .117 .007 
80- .025 .325 .550 .100 .000 
Correlation .40. 

Trait 3.—Personality 

IQ L I M S H 
130 up .000 .047 .238 .595 119 
120’s .009 .049 .460 411 .068 
110’s .O11 .067 .519 .368 .033 
100’s .017 .144 .549 . 268 .020 

90’s .020 .208 .575 .191 .004 
80’s .058 .286 .573 .073 .007 
80- .100 .300 .550 .050 .000 
Correlation .37. 
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Trait 4.—Initiative 
1Q L I M S H 
130 up .000 023 .238 571 .166 
120’s .000 .068 .460 .401 .068 
110’s .005 .100 .497 .374 .022 
100’s .010 .168 .567 .213 .041 
90’s .020 .212 .625 .137 .004 
80’s .058 .257 .595 .088 .000 
80- .075 .300 .500 .100 .025 
Correlation .36. 
Trait 5.—Control of Attention 
1Q L | I | M § H 
130 up .000 .023 .309 .500 .166 
120’s .000 .068 401 .401 .127 
110’s .016 .067 474 .391 .050 
100’s .024 175 .470 .298 .030 
90’s .020 .254 .504 .216 .004 
80’s .088 .257 .536 .110 .007 
80- .050 . 200 .600 .150 .000 
Correlation .34. 
Trait 6.—Leadership 
IQ L I M S H 
130 up .000 .119 .309 .500 .071 
120’s .007 .088 .539 .294 .068 
110’s .005 .117 .592 . 262 .022 
100’s .024 .182 .591 .189 .013 
90’s .041 .258 .604 .091 .004 
80’s .088 052 .522 .029 .007 
80- .125 .450 .400 .025 .000 
Correlation .33. 
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Trait 7.—Self-confidence 













































































1Q L I M S H 
130 up .000 .023 .261 .500 214 
120’s .019 .058 431 401 088 
110’s O11 .061 497 .374 055 
100’s .010 116 .553 305 013 

90’s .016 195 .608 158 .020 
80's 044 .235 .617 .095 .007 
80- 025 .200 .600 .150 025 
Correlation .33. 

Trait 8.—Sense of Accuracy 

1Q L I M S H 
130 up .000 047 261 5AT 142 
120’s .000 .098 401 352 147 
110’s .000 .089 491 379 .039 
100’s .020 164 512 .268 034 

90’s .025 221 .575 163 017 
80's .051 .220 602 117 007 
80- .050 175 625 .150 000 
Correlation .32. 
Trait 9.—Cooperation 
1Q L I | m | 8 | H 
—_——— ee ‘mee —$$—$————$$$$_$__—— = _ —_ — - 
130 up .000 oo | 214 | 666 19 
120’s .009 039 | 401 | 441 | .107 
110’s 011 061 | 413 | 463.050 
100’s | .010 068 | .512 | 350 058 
90's | .004 095 | — .587 279 .033 
80's | .022 154 .588 .227 .007 
80- | —.050 150 575 .225 .000 
Correlation .27. 
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Trait 10.—Regularity-persistency 
1Q L I M Ss H 
130 up .000 .047 261 .523 166 
120’s 000 | .098 411 892 .098 
110’s .016 | .072 480 402 .027 
100’s .020 | .099 522 312 044 
90’s .037 | .158 575 216 .012 
80's .066 | 191 522 191 .029 
80- 025 | 200 575 .200 .000 
~ Correlation 26. 
Trait 11.—Trustworthiness 
1Q L | I M s H 
— 
130 up 000 | .023 261 500 214 
120’s 009 049 352 .421 .166 
110’s O11 | 055 374 463 094 
10's .010 | 085 457 .378 069 
90’s | 013 100 492 353 041 
80's 021 | 147 493 309 027 
80- 000 | 175 425 375 025 
Correlation .22. 
Trait 12.—Respect for Authority 
IQ L I M s H 
130up | —-.000 000 095 690 214 
120’s 000 049 274 470 205 
110’s O11 .039 .307 530 lll 
100’s | 013 058 316 522 .089 
0’s | 000 .050 379 500 .070 
80’s | 029 088 352 463 .066 
80- | 000 125 875 450 .050 
Correlation .17. 





OTHER INDICATIONS OF RELIABILITY 


The reliability of our method for finding the relationship of traits 
and intelligence is partly indicated by the ranking of correlations in 
different schools where two groups were rated independently. These 
combined made up the 1030 students mentioned above. Five hundred 
and eighty of these were in Grades VII, VIII, and IX of a junior high 
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school; the remaining four hundred and fifty were ninth graders jp 
a senior high school. The remarkable agreement between the two 
sets of coefficients is shown in Table V. The agreement is statisti. 
cally expressed by the reliability coefficient of 0.89. That is to say, 
whatever relationship was found for trait ratings and intelligence jn 
one school was found to the extent of 0.89 in the other school. 


TaBLE V.—CoRRELATION OF INTELLIGENCE AND TRAIT RATINGS IN Two Scxoors 








, 3 ae 
| Senior high | Junior high |Senior high|Junior high 
Traits | school school school school 
| coefficients | coefficients rank rank 
Quickness of thought.......... 42 | 45 1 1 
esd we ec wherkes s . a 43 2 2 
Force of personality........... 37 | .38 3 5 
Capacity of leadership......... 35 41 4 3 
Initiative-aggressiveness....... .34 .40 5 4 
Control of attention........... 33 oT 6 6 
Self-confidence............... ol 35 7 8 
Sense of accuracy............. .29 .36 8 7 
Cooperativeness.............. 27 .28 9 10 
Regularity-persistency......... 24 .30 10 9 
Trustworthiness.............. .22 17 11 12 
Respect for authority......... 13 .23 12 11 

















Incidentally, these figures show in part that composite ratings are 
not as haphazard and meaningless as some have imagined. Such a 
consistent relationship to an objectively determined factor—intelli- 
gence quotient—could not reasonably be anticipated if teachers’ 
pooled judgments had been worthless. 

It will be noted, also, that in each of the three sets of correlations, 
quickness of thought and memory stand at the top. This, no doubt, 
is due to the fact that these two traits are important elements of intelli- 
gence and should, therefore, be expected to correlate most highly with 
intelligence quotients. Those traits which, on the other hand, are 
more or less dynamic in nature fall consistently to the bottom of the 
series when ranked with respect to correlations with intelligence. 
This fact has a significant bearing on a number of important social 
problems. 

It is not claimed by the writer that the best and most reliable 
method of rating trait characteristics has been used in these studies. 
He has only attempted to present some of the results obtained from 
the method employed. 
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NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 


mi OTHER MAGAZINES a” 


REPORTED BY C. 0. MATHEWS 
Graduate Student, Teachers College 
Columbia University 











CasE STUDIES 


Frank, The Boy Who Couldn’t Be Good. Hugh Kelly. The Training School 
Bulletin, Mar., 1926, 166-172. This is a case study of a feeble-minded delinquent 
boy and an account of methods employed to reform him. 

II Peter; The Beginnings of the Juvenile Court Problem. Helen T. Wooley. 
The Pedagogical Seminary and Journal of Genetic Psychology, Mar., 1926, 9-29. 
Acase study showing the way delinquent tendencies get started in young childhood. 

The Downey Will-temperament Profile in Personality Studies of Juvenile Delin- 
quents. Emily Wires. The Journal of Abnormal and Social Psychology, Jan., 
1926, 416-440. Ten cases are reported in detail. 

The Kindergarten as a Mental Hygiene Agency. Arnold Gesell. Mental 
Hygiene, Jan., 1926, 27-37. Case Studies of exceptional kindergarten children are 
reported with suggestions for special guidance in procedures. 

A Speech Clinic Case with Misconduct as a By-product. Herman H. Young. 
The Journal of Applied Psychology, Dec., 1925, 371-381. The diagnosis and 
treatment of the case is given in detail. Improvement was very rapid. 


MISCELLANEOUS 


Thorndike’s Contributions to Psychology and Education. Teachers College 
Record, Feb., 1926, 516-575. Professor Thorndike’s contributions in each of the 
various fields of psychology and education are summarized by a leader in this field. 

Annotated Chronological Bibliography of Publications by E. L. Tuorndike. 
Teachers College Record, Feb., 1926, 466-515. This is a complete bibliography, 
including books, articles and reviews. 

Study Habits of High School Pupils. Percival M. Symonds. Teachers College 
Record, Apr., 1926, 713-724. The study habits of five good students and five poor 
students were carefully observed and measured insofar as this was possible. 

A Modern Systematic vs. an Opportunistic Method of Teaching. Arthur I. Gates 
assisted by Mildred I. Batchelder and Jean Betzner. Teachers College Record, 
Apr., 1926, 679-700. A comparison of the information, skills, attitudes and habits 
of two groups taught by different methods. 


MEASUREMENTS—GENERAL REFERENCES 


What We Are Failing to Measure in Education. Herbert A. Toops. Journal of 
Educational Research, Feb., 1926, 118-128. A suggested plan for cooperative 
research in education. 
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Standard Deviation vs. Age as a Score Unit. G.M. Wilson. Journal of Educa. 
tional Research, Mar., 1926. A comparison of the advantages of standard Scores, 
age scores and percentiles. 

A Simple Apparatus Which Gives Tests and Scores. §S.L. Pressey. School and 
Society, Mar., 20, 1926, 373-376. A description of the apparatus is given and its 
possibilities discussed. 

Criteria of a Standardized Test. G. M. Wilson. Educational Review, Mar. 
1926, 138-141. The author enumerates and discusses primary and secondary 
criteria. 

Factors in Mental and Scholastic Ability. H. G. Stead. British Journal of 
Psychology, Jan., 1926, 199-221. This study is an analysis of factors in mental 
and scholastic ability as measured by mental, motor and scholastic tests and 
character ratings. 

Empirical Data on the Scoring of True-false Tests. Donald G. Patterson and 
T. A. Langlie. The Journal of Applied Psychology, Dec., 1925, 339-348. The 
number-right method of scoring is shown to be more reliable than the right-minus. 
wrong method. 


EDUCATIONAL MEASUREMENTS 


Handwriting Survey to Determine Grade Standards. John G. Kirk. Journal of 
Educational Research, Mar., 1926, 181-188. In the opinion of 100 competent 
judges, quality 60 on the Gettysburg Edition of the Ayres Scale is adequate for 
social correspondence. 

A Test of Achievement in College Chemistry and Results Obtained by Its Use with 
Both High School and College Classes. Fred C. Mabee. Journal of Chemical 
Education, Jan., 1926, 70-76. This is a report of the construction of the test and 
the results of its administration in three colleges and three high schools. 

Experiment in Testing Appreciation. Helen V. Ruhlen. The English Journal, 
Mar., 1926, 202-209. The experimenter makes use of the Abbott and Trabue Test 
to determine the place such poems as “‘L’ Allegro” should have in a high-school 
English course. 

Hotz Algebra Scales in the Pacific Northwest. Walter Crosby Eells. Mathe- 
matics Teacher, Nov., 1925, 418-427. A report is given of the results of about 4000 
students in 67 schools. Results are summarized by states, tests used, size of 
school and sex. 

Use of the Inventory Test in Plane Geometry. Leonard D. Haerter. Mathe- 
matics Teacher, Mar., 1926, 147-154. This test is used to determine the amount 
of information about plane geometry possessed by beginning pupils. The test 
is contained in the article. 

Reports of the Nation-wide Testing Survey in Problem-solving, English and 
Reading. These reports are printed separately in pamphlet forms and will be 
supplied free of charge by the Public School Publishing Co., Bloomington, Ill. 

A Study of the Intelligence and Achievement of the June, 1925, Graduating Class 
of the Grover Cleveland High School, St. Louis, II. W.D.Shewman. The School 
Review, Mar., 1926, 219-226. Comparisons are made between records of boys 
and girls and between two sets of records obtained on the same students seven 
semesters apart. 
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The Value of Tests of Intelligence and Achievement Tests for Individual Diagnosis. 
Agnes L. Rogers. Childhood Education, Mar., 1926, 317-321. An account is 
given of diagnosis and remedial treatment by use of tests in Los Angeles. 

The Revised Norms for the Range of Information Test in Science. Elliott R. 
Downing. School Science and Mathematics, Feb., 1926, 142-143. These norms 
are for the revised Range of Information Test which consists of 60 words and 
phrases which stand for concepts in various fields of science. 

Information Exercises in Biology. J. L. Cooprider. School Science and 
Mathematics, Nov., 1925, 807-13. These exercises are based upon the vocabulary 
and terms used in nine widely used texts. 

A Study of Achievement and Subject-matter in General Science. August Dvorak. 
General Science Quarterly, Mar., 1926, 445-474. This is a report of the reliability 
of the General Science Scale and some comparative study of achievements based 
on the items used in constructing the scale. 

Individual Diagnosis in Written Composition. Matthew H. Willing. Journal 
of Educational Research, Feb., 1926, 77-89. An experimental comparison is made 
between the use of original compositions and proof-reading tests as means of 


diagnosis. 
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CONDUCTED BY JOHN HOCKETT 
The Lincoln School of Teachers College 











NATURE vs. NURTURE 


The Influence of Nurture upon Native Differences, by T. L. Kelley, 
New York: The Macmillan Co., 1926. 


Not in a long time have I found a book so stimulating and so full 
of new points of view—or so difficult to read. It seems unfortunate 
that a book which carries so important a message should have to be 
cloaked in such a difficult style. Almost every sentence is packed 
with meaning. Many of the technical terms are rigorously defined 
in statistical terms and these intricate concepts must be carried in 
mind to follow the argument. Only too infrequently does Kelley 
break through the lines of his statistical argument and let us see the 
educational and social significance of his findings. When he does this 
his enthusiasm carries him into forms of expression at times vivid, 
at times sublime, at times ridiculous, such as ‘mental processes as 
straightforward, ingenious, and brilliant as a tender blade of grass 
with the morning dew upon it.” 

The problem of nature and nurture is a stubborn one and Kelley 
has made a masterly statistical analysis of his problem. The problem 
he has attacked is that of ravelling the relative influences of nature 
and nurture in causing such differences as being more gifted in one 
mental function than another. To use extreme illustrations, what are 
the relative contributions of nature and nurture toward the creation 
of gifted linguists or exceptional spellers or lightening calculators. 
Kelley is not concerned, however, with geniuses solely but with the 
idiosyncrasies of normal persons. A person’s ability in any function 
is the product of his native ability, maturity, nurture and other factors 
which come under the heading of chance. Maturity, which Kelley 
takes as synonymous (because it correlates perfectly) with chronologi- 
cal age, is eliminated by taking all scores as deviations from age norms 
of the age groups 8, 11 and 14 used in the study. Nature and nurture 
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are given rigorous statistical definitions. Nature is “a trait which 
does not change with age,’”’ while nurture is an “influence, uncorre- 
lated with nature and making for individual differences which change 
with the length of time or number of years through which it acts.” 
In other words, to bring these terms into more familiar parlance, 
nature is conceived as the factor which makes the IQ constant, some- 
thing which does not change with age, while nurture is the environ- 
mental influence which causes the EQ to fluctuate above or below the 
1Q. 

: Kelley uses, as data, scores on the Stanford Achievement Test. 
Before these data can be made to yield the answer to the problem as 
defined they must be refined so as to eliminate (a) the effect of unreli- 
ability of the score; (b) the inequality of units of the test; (c) the non- 
equivalence of the three groups of children—8-year olds, 11-year olds 
and 14-year olds. 

His results come out in terms of which the following are samples: 
97 per cent of adult differences between arithmetic reasoning and 
spelling are due to original nature and 3 per cent to nurture, nurture 
working to increase the nature differences; or 32 per cent of adult 
differences between ability with sentence meaning and science infor- 
mation are due to original nature and 68 per cent to nurture, nurture 
working to increase the native differences; or 63 per cent of the adult 
differences between knowledge of word meaning and history and liter- 
ature information is due to original nature and 37 per cent to nurture, 
nurture working to level or make less the native differences. 

The educational implications of these findings are somewhat 
startling. Sometimes nurture increases native differences, sometimes 
decreases them. The particular functions whose differences are 
increased or decreased by nurture are not precisely the ones which 
educators would wish. Differences in language usage, for instance, 
are increased by nurture. Differences in computation, or reading 
ability, on the other hand, are decreased by nurture. The leveling 
influence of the school seems to work exactly where it should not. 

Kelley has some interesting things to say about mathematical 
ability and memoriter learning. Concerning the former he finds that 
the largest nurture factor in his results (which is a negative factor) 
is between computation-arithmetic reasoning. ‘‘Natively these 
functions are far apart, but nurture brings them together.” From 
this he jumps to the result that “‘one must not conclude that great 
computation ability is a prerequisite to high mathematical attain- 
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ment in advanced fields.” Concerning memoriter learning, Kelley 
finds evidence from the large nurture influences causing differences 
between paragraph meaning and spelling that children may develop 
one of two attitudes toward printed matter: (a) an interest in its 
meaning, (b) an interest in its.structure. This he generalizes into a 
possible tendency in many activities to either use models and examples 
(an act of memory) or analysis and generalization (an act of reasoning) 
in solving problems. 

Concerning gifted children Kelley finds that idiosyncrasy decreases 
with increasing age indicating that ‘‘the public school is very effectively 
eliminating oddity.” 

This and more is packed into a little book of 49 pages, together 
with two appendices giving the details of certain statistical problems 
and some of the basic data of the study. There would seem to be 
much in Kelley’s findings which has still to be interpreted, but he 
prefers to forego extended discussion that his main findings (that the 
school is leveling precisely those functions which common sense would 
suggest should be made more distinct, and vice versa) may have greater 
prominence. 

Scientific contributors usually take a former generalization and 
break it into many generalizations. Thorndike has said: “Insofar as 
the differences in achievement found amongst a group of men are due 
to differences in the quantity and quality of training which they have 
had in the function in question, the provision of equal amounts of 
the same sort of training for all individuals in the group should act 
to reduce the differences.” Kelley has refined this statement by 
showing in what direction nurture acts on nature and to what extent 
in a number of specific functions. 

When one compares Kelley’s study with previous studies of the 
nature-nurture problem one notes that the excellence of Kelley’s 
contribution lies in its profound statistical analysis. 

The New York Times prints in its weekly Book Section a list of 
new books under a number of captions. One of these captions is 
“Science and Psychology.”’ I would have no hesitation in offering 

this study of Kelley’s in the field of psychology as a sample of scien- 
tific technique. PEeRcIVAL M. Symonps. 
Teachers College, Columbia University. 
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A BritisH Text EMPHASIZING EXPERIMENTATION 


Experimental Psychology, by Mary Collins and James Drever. New 
York: E. P. Dutton and Co., 1926. Pp. VIII + 315. 


Workers in the field of experimental psychology will welcome this 
eminently readable contribution from the director of the Edinburg 
Laboratory and his able pupil. Here, at last, is a well constructed 
text which meets many of the demands of a first year course in the 
subject. 

The authors take the view that ‘‘the systematic presentation of 
the science of psychology is almost wholly divorced from the experi- 
mental work of the psychologist,” which seems to be too strong a 
statement when one considers the recent texts of Woodworth, Hunter, 
and others. But the emphasis is properly placed and the point of 
view is maintained throughout. It is no “fireside” psychology which 
these Scotch investigators offer; they claim community of method with 
other sciences even if the difference in subject matter forbids the 
application of a centimeter-gram-second system of physical units. 

A brief discussion of psycho-physics introduces the reader to the 
methods of experimentation; this is followed by a useful statistical 
section in which the theoretical basis of correlation work is found in 
Mills’ fifth canon of induction, the famous principle of “concomitant 
variation.” The chapter on vision is particularly full and the copious 
illustrative material and the clarity of presentation smooth the path 
of understanding for the novice or layman. The succeeding chapter 
on hearing suffers by comparison and apparently was written without 
knowledge of Ogden’s researches in audition. A consideration of 
the other ‘‘gateways of knowledge” leads into the material of per- 
ception. At least one doctrine now so emphasized by the Gestalt 
theorists, although it is not new, is unequivocally accepted by the 
incorporation of the view that “the relation between figure and ground 
is a necessary characteristic of perceptual experience” (p. 101). 

In their efforts at comprehensive treatment the authors have un- 
fortunately made their chapter on learning and memory entirely 
too brief ;.their error, however, indicates the wealth of useful materials 
concerned with the higher mental processes that have been accumu- 
lated since the writing of Sanford’s and Titchener’s manuals. Ameri- 
can sources figure very prominently in the content of the volume and 
suggest the claim that experimental psychology, originally so wholly 
a German science, is now becoming an American one. However 
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this may be, this new book commends itself as a superior text for 
instruction in elementary courses for its own real merits other than its 
unintentional appeal to our national pride and vanity! 


Epwin Maurice Bamor. 
Dartmouth College. 


A GUIDE TO PERFORMANCE TeEsTs FOR YOUNG CHILDREN 


Performance Tests for Children of Pre-school Age, by Rachel Stutsman. 
Clark University, Worcester, Mass.: Genetic Psychology Mono- 
graphs, Vol. I, No. 1, 1926. Pp. 67. 


This first number of the genetic psychology monographs is another 
contribution to the field of performance tests for children below school 
age. The monograph is limited to a presentation of 19 performance 
tests for children under seven years of age, with decile scores by half 
years for 529 children. Six of the 19 tests were given to children as 
young as 18 months of age. 

The children tested were obtained from the waiting list of the Mer- 
rill Palmer nursery school, and from public school kindergartens, 
orphanages, day nurseries, child care agencies and public health clinics 
in Detroit. An attempt was made to get as representative a sampling 
as possible, although the group as a whole was probably slightly supe- 
rior mentally. 

A description of the test apparatus and methods of procedure 
takes up the major part of the volume. The methods and instructions 
are adequately and clearly given. The techniques for the Wallin 
pegboards and Seguin formboard differ somewhat from those previously 
reported upon for preschool children by Baldwin and Stecher. 

The monograph should prove a useful examiner’s guide for persons 
interested in performance tests for preschool children. No attemptis 
made in this report to study the interrelations of the various tests or 
to compare the performances of the same child on various tests. 

State University of Iowa. Berto WELLMAN. 


MENTAL HEALTH 


Mental Abnormality and Deficiency: An Introduction to the Study of 
Problems of Mental Health, by Sidney L. Pressey and Luella Cole 
Pressey. New York: The Macmillan Co., 1926. Pp. XII + 356. 


The authors have prepared a college text book for what might be 
called a course in applied abnormal psychology. The book offers 4 
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yseful compilation of much previously existing information about 
mental disease and deficiency. 

Part I gives attention to methods of case study and history taking, 
offering examples. Part II deals with the normal individual, “border- 
land” conditions, functional psychoses, organic psychoses and feeble- 
mindedness. The treatment of dementia praecox impressed the 
reviewer as much more readable, if necessarily less comprehensive, 
than treatments in other texts. Part III deals with evidence of the 
extent of mental disorder, its causes, treatment, and suggestions for 
mental hygiene. 

One of the unusual features of the discussion is the presentation of a 
description of the normal individual. According to the investigation 
of the authors: (1) The family of the normal individual is “likely 
to show at least one insane, feeble-minded, epileptic, alcoholic, or 
syphilitic individual in the near ancestry. (2) The individual’s own 
history is not to be considered unusual if he did not finish grammar 
school, if he was shown to have been involved in a few childish thefts, 
to have masturbated occasionally during childhood, to have had some 
adolescent sex experience, to have changed jobs somewhat frequently, 
to be in debt, or to have quarreled with his family. (3) An individual 
cannot be considered necessarily abnormal if he shows an IQ of 80 
(or if an adult, a mental age of 12), supposes it is Tuesday when it is 
really Thursday, cannot name the first president, appears slightly 
flighty or lacking in judgment, appears with no collar and a dirty 
shirt, or if he shows a suspicious attitude toward the examiner.” 

As with many others of its kind, the most obvious limitation of 
this book is its section on the treatment of borderland and more 
serious problems of mental health. Knowledge in the field of neuroses 
and psychoses is compounded mainly of techniques for classification 
and some principles of causation. It may be quite possible to diag- 
nose an acute sense of inferiority and to show real or presumptive 
relationships to childhood experiences, but suggestions for treatment 
are usually vague and disappointing. The authors have used “psy- 
choanalysis’’ as synonymous with the process of encouraging the per- 
son to talk out his difficulty and then setting up some situations for 
re-education. This is undoubtedly a good method of approaching 
certain emotional problems, but it surely would not be recognized by 
some professional psychoanalysts as equivalent to their involved 
technique for dealing with fixation and transference. The significance 
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of the patient’s understanding of and participation in the treatment 
process might perhaps have received more adequate emphasis. 

The discussion is illustrated by 37 cases, each of which has been 
outlined in accordance with a complete case study form, more Sug- 
gestive of the social case worker than of ordinary psychological prac. 
tice. While there is a danger in exaggerating the value of information 
obtained without the complete cooperation of the patient, the author 
makes the excellent point that psychologists have proceeded too often 
to handle cases upon the basis of meagre information given by only 
one person under circumstances which would lead the evidence to 
be regarded as very doubtful by a careful social case worker. 

The book is enriched by a good glossary and an excellent anno- 
tated bibliography of 98 modern discussions of the problem of mental 
abnormality and deficiency. GoopwIin B. Watson. 

Teachers College, Columbia University. 








nt 


en 
lg- 
AC. 
On 
ior 








